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Abstract. We survey results in algebraic complexity theory, focusing on matrix multiplication. 
Our goals are (i.) to show how open questions in algebraic complexity theory are naturally posed 
as questions in geometry and representation theory, (ii.) to motivate researchers to work on 
these questions, and (iii.) to point out relations with more general problems in geometry. The 
key geometric objects for our study are the secant varieties of Segre varieties. We explain how 
these varieties are also useful for algebraic statistics, the study of phylogenetic invariants, and 
quantum computing. 



1. Introduction 
1.1. Strassen's algorithm. Let A and B be 2 x 2 matrices 

a\ 4)' B -{bj b\ 
Recall the usual algorithm to calculate the matrix product C = AB: 
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This algorithm uses 8 multiplications and for n x n matrices it uses n 3 . 

Question: Is there a "better" algorithm for multiplying matrices? By "better" one could mean 
an algorithm that uses fewer arithmetic operations (+,—,*), or simply fewer multiplications. 
The number of multiplications needed governs the total number of arithmetic operations in 
such a way that asymptotic results depend primarily on the number of multiplications used. 
(See Definition 11.21 for a precise statement.) In this article we focus exclusively on minimizing 
multiplications. (In actual implementations memory cost is also an important factor.) 
In 1969 Strassen [53] made the following discovery. Set 



/ = 


{a\ + 4){b\ + bl), 


i7 = 


(al + 4)b\, 


m = 


a\(b\-bl) 


IV = 


4(-b\ + bl) 


V = 


(a\ + a\)b 2 2 


VI = 


{-a\ + a\){b\+b\ 


VII = 


{a\-4)(b\ + bl), 
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Now check for yourself that if C = AB, then 

c\ = I + IV - V + VII, 

4 = 11 + IV, 

c\ = /// + V, 

4 = 1 + 111-11 + VI. 

Thus the above is an algorithm for multiplying two by two matrices performing only seven 
multiplications. 

Remark 1.1. Strassen was attempting to prove, by process of elimination, that such an algorithm 
did not exist when he arrived at it. We will see in £j3]why the result could have been anticipated 
using elementary algebraic geometry. 

1.2. The complexity of matrix multiplication. In Strassen's algorithm the entries of the 
matrices need not be scalars - they could be elements of an algebra. Let A, B be 4 x 4 matrices, 
and write 

A 4\ n-( h \ b l 



A -[a\ a|J' B ~\b\ hi 

where dj, bj are 2x2 matrices. We may apply Strassen's algorithm to get the blocks of C = AB 
in terms of the blocks of A, B performing 7 multiplications of 2 x 2 matrices. Since we can apply 
Strassen's algorithm to each block, we can multiply 4x4 matrices using 7 2 = 49 multiplications 
instead of the usual 4 3 = 64. In fact, if A,B are 2 k x 2 k matrices, we may multiply them using 
7 k multiplications rather than the usual (2 fc ) 3 . Even if n is not a power of two, we can still save 
multiplications asymptotically by enlarging the dimensions of our matrices, placing zeros in the 
new entries, to obtain matrices whose size is a power of two. Asymptotically we can multiply 
n x n matrices using 0(n l ° 92 ^) ~ 0(n 2 ' 81 ) operations, as let n = 2 k and write 7 k = (2 k ) a so 
k{log2l) = ak(log22) and we obtain a = log27. 

Definition 1.2. The exponent of matrix multiplication is 

= inf{/i G K | Mat nxn may be multiplied using 0{n h ) scalar multiplications}. 

Strassen's algorithm shows < log2{7) < 2.81. 

Remark 1.3. If one replaces the phrase "scalar multiplications" with the phrase "arithmetic 
operations" in the definition, is unchanged, see |14j . Proposition 15.1. 

2 2 

Matrix multiplication of square matrices is a bilinear map that we denote M n ^ n ^ n : C n x C" — > 
C n . (In this article we restrict our attention to the complex numbers, so e.g., all vector spaces 
are finite dimensional vector spaces over C.) When discussing a minimal number of arithmetic 
operations (or multiplications) for executing a bilinear map, it is usually within the context of a 
class of algorithms. A natural class of algorithms for executing a bilinear map is as follows: let 
A, B, C be vector spaces, let A* := {/ : A — > C | / is linear} denote the dual vector space (and 
similarly for B), and let T : A x B — > C be a bilinear map. Choose a 1 G A*, (3 % G B* , a G C 
such that T(v,w) = ^2l = iCt l (v)P 1 (w)ci. The minimal number r over all such presentations of 
T is called the rank of T and denoted R(T). A related notion, more natural to geometry and 
defined in £j2j is that of border rank, denoted R(T) . Another concept that comes into play when 
discussing the space of all bilinear maps A x B — > C, is the typical rank, which is the rank of a 
generic bilinear map A x B — > C . 

Strassen's algorithm shows that the rank of the multiplication of two by two matrices is at 
most seven, and Winograd [57] proved that in fact it equals seven. 
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1.3. Overview. To examine the complexity of matrix multiplication more geometrically, we 
first, in §2J rephrase it using tensors. Next, in §21 we introduce algebraic varieties which stratify 
the space of tensors, the secant varieties of Segre varieties. (The above-mentioned border rank of 
a tensor describes its location with respect to this stratification.) This is done in two steps, first 
introducing secant varieties to any algebraic variety in §3.11 then specializing to Segre varieties 
in §3.21 We also rephrase the main open problems in the complexity of matrix multiplication in 
terms of secant varieties of Segre varieties. In §3.31 we summarize the known results. 

Before discussing those results in detail, we take two detours. In the first, we describe two 
problems from algebraic geometry where secant varieties arise: the polynomial Waring problem 
and Hartshorne's conjecture on linear normality. These are described in in §31 In the second, we 
describe other applications of secant varieties of Segre varieties - to algebraic statistics (especially 
the study of phylogenetic invariants) and quantum computing, which is done in §3J These detours 
will allow the reader to place the topics discussed in the remainder of the paper in a larger 
mathematical context. 

In §6] we describe Strassen's equations for secant varieties of Segre varieties and their use in 
proving lower bounds for rank and border rank. In particular, we present a new proof of Bldsser's 
^-Theorem. We rephrase Strassen's equations invariantly in §101 and describe generalizations. 

While it is well known that the limit of a family of secant lines is a tangent line (or a secant 
line itself), exactly what can be in the limit of a secant /c-plane is not known. We discuss what 
is known about this problem in §T] and show how to use this knowledge to prove upper bounds 
for the complexity of matrix multiplication in §8.11 (We explain how to use such limits to 
prove lower bounds in the discussion below Theorem 13.91 ) A group-theoretic approach to upper 
bounds is described briefly in §8.21 

We discuss dimensions of secant varieties of Segre varieties in §91 focusing on the use of 
Terracini's Lemma. 

Any proper study of varieties invariant under a group action, e.g., the secant varieties of Segre 
varieties, should exploit representation theory. The representation theory relevant to this study 
is discussed in §111 Representation theory is the most important tool discussed in this article. 

A common technique in geometry is to understand a complicated geometric object via the con- 
struction of auxiliary objects that are more tractable, and the problem at hand is no exception. 
We describe two such objects in §T2l 

In §131 we describe a collection of techniques developed by Weyman for the study of G- 
varieties and their application to secant varieties of Segre varieties. (A G-variety is a variety 
invariant under the action of an algebraic group G.) These techniques find the entire minimal 
free resolution of the ideal of a variety and describe the nature of its singularities. 

Finally, in an appendix §141 we give nontraditional and more invariant presentations of two 
standard notions in complexity theory - multiplicative complexity and separations. 

1.4. Acknowledgments. Many colleagues generously helped the author in the preparation of 
this article. Special thanks are due to E. Allman, M. Blaser, P. Biirgisser, L. Garcia, D. Gross, 
J. Morton, G. Ottaviani, C. Robles and the anonymous referee for numerous suggestions to 
improve this article. In particular, the new proof of Blaser's theorem arose out of discussions 
with P. Biirgisser. 

2. Tensor formulation 
Recall that for vector spaces V, Vj , 

V* : = {/ : V -» C | / is linear}, 
Vi® • • • ® V n : = {/ : V* x • • • X V* — ► C | / is linear in each factor}. 
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Given Vj G Vj, ctj G V*, define ui® • • • <S) v n G Vi<8> •••<8>V n by t> i® • • • ® u n (ai, . . . , a n ) = 
• • • a n (v n ). An element / G Vi®V2, i.e., a bilinear map f : V* x V 2 * — > R, may also be 
considered as a linear map 

/ : V? -» v 2 

" /(a,-) 

where /(a, •) G (V 2 *)* = F 2 , i.e., for G F 2 *, /(a, ■)(/?) = /(«,/?). 

Definition 2.1. Let Vi,...,Vk be vector spaces. An element z G Vi® • • • ® V*. is called de- 
composable if there exist Vi € Vi such that z = ui® • • • ® . Define the rank of an element 
T G Vi®V2® . . . ®Vfc to be the minimal number r such that T = XX=i z " with each z u de- 
composable. We refer to an explicit expression for a tensor T as a sum of r monomials as a 
computation ofT of length r, and sometimes use to denote the realization of T as a computa- 
tion. This terminology is consistent with the definition of the rank of a linear map T : V{ — > V2 
(i.e., an element T G Vi®V2) and the rank of a bilinear map T : V* x V^* — ► V3 given in §1.21 
(i.e., an element of T G Pi®V2®V3 = A*®I?*®C). Note that the length of a computation of a 
tensor is unchanged if we make changes of bases in the vector spaces V{. 

2.1. Strassen's algorithm as a tensor. The standard algorithm for the multiplication of two 
by two matrices in terms of tensors as follows: let A, B, C each denote the space of 2 x 2 matrices; 
give A the standard basis dj for the matrix with a 1 in the (i, j)-th slot and zeros elsewhere, 
and let a*- denote the corresponding elements of the dual basis of A*. Similarly for B, C. Then 
the standard algorithm is (compare with ([1]): 

^ M 2 , 2 ,2 =ol\®I3{®c\ + a\®(3l®c\ + a\®f3\®c\ + a%®(%®<$ 

+ a\®p\®c\ + a 2 ®/?f ®c 2 + ai®/3 2 ®c 2 + c? 2 ®fi\®c\ 

and Strassen's algorithm is 

M 2i2 ,2 =(«i + o!)®(# + /3|)®(cl + c£) + [a\ + a|)®/3j®(c? - c\) 
+ ai®($ - /3|)®(4 + c 2 ) + «1®(- / 9 1 1 + Pl)®{c\ + cj) 
+ (al + a£)®$®(-cj + 4) + (-a} + a?)®(/?J + /3^)<S>c| 
+ (a 2 --a|)®(/^ + /?i)®c}. 

2.2. Approximate algorithms. An approximate algorithm for a tensor T is a sequence of 
algorithms, usually of lower rank tensors, that converge to an algorithm for T. The border rank 
of a tensor T is the lowest rank of tensors in such sequences and is denoted R(T). Note that 
rank and border rank can indeed be different - consider the following example: 

(4) T = ai®6i®ci + ai®&i®c 2 + ai®6 2 ®ci + a 2 ®&i®ci 

One can show that R(T) = 3, but we can approximate T as closely as we like by tensors of rank 
two as follows. Let 

(5) T(e) = -[(e - l)ai®&i®ci + (ai + ea 2 )®(&i + e6 2 )®(ci + ec 2 )] 

and allow e — > 0, so R(T) < 2 (in fact equality holds). The geometry of this limit is discussed 
in SJ3JZJ 
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3. Geometric formulation 

3.1. Secant varieties. Let V be vector space and let PV be the associated projective space of 
lines through the origin in V, so we have a map ir : V\0 — > PV. If v G V\0, let [v] = ir(v) G PV 
and for Z C PV, let Z = k~ x {Z) C V. For scale invariant sets U C V\0, write PC/ for ir(U). 
We use projective space in addition to vector spaces because the properties we are interested 
in (rank, border rank) are scale invariant. Because we go back and forth between vector and 
projective spaces many objects end up being decorated with hats and "P"s 

For our purposes, a variety X C FV is the common zero locus in FV of a collection of 
homogeneous polynomials on V. Given a variety X, we will construct a sequence of auxiliary 
varieties X C 02 PO C ••• C crj(X) = FV, called the secant varieties of X which determine 
a stratification of PV. This stratification will generalize the stratification of the space of m x 
n matrices by rank. When V = Ai® ■ ■ ■ <S> A n and X is the projectivization of the set of 
decomposable tensors, the stratification will coincide with the stratification of tensors by their 
border rank, and / is the typical rank mentioned in SJT] and defined below. 

For readers not accustomed to secant varieties, we begin with several special cases to help 
visualize them. Recall that projective space PV has the property that, given any two distinct 
points p,q G PV, there is a unique line, i.e., a linearly embedded P 1 C PV containing p and q, 
which we denote Pp j<r Let C C PV be a smooth curve (one-dimensional variety) and q G FV a 
point. Let J{q, C) C PV denote the cone over C with vertex q, which by definition contains the 
union of all points on all lines containing q and a point of C. More precisely, J{q, C) denotes 
the closure of the set of such points. It is only necessary to take the closure when q G C, as in 
this case one also includes the points on the tangent line to C at q, because, as anyone who has 
ever taught calculus knows, the tangent line is the limit of secant lines P\ x . as xj —>■ q. Define 
J(q, Z) similarly for Z C PV, a variety of any dimension. Unless Z is a linear space and q G Z, 
dim J{q, Z) = dim Z + 1. 

Definition 3.1. The join of Y, Z C PV is 

J{Y,Z) = \J x& Y,yeZ,xyty^'xy 

Here the overline denotes Zariski closure, i.e., if U C PV is a subset, then U is the common 
zero set of all homogeneous polynomials vanishing on U. The same set is obtained if one takes 
the closure in the usual topology, but the Zariski closure is more useful when dealing with 
polynomials. We may think of J(Y, Z) as the union of the cones U gg y J(q, Z) (or as the union 
of the cones over Y with vertices points of Z.) 

If Y = Z, we call o~2(Y) = J(Y,Y) the secant variety of Y. By the discussion above, 02OO 
contains all points of all secant and tangent lines to Y. Similarly, define the join of k varieties 
to be the closure of the union of the corresponding P^" 1 ^, or by induction as J(Yy, . . . , Y^) = 
J{Y\, J(Y2, . . . , Yk))- Define k-th secant variety of Y to be 0"fc(V) = J(Y, . . . , Y), the join of 
k copies of Y. For smooth varieties Y C PV, let t(Y) denote the union of all points on all 
embedded tangent lines to Y. Usually t(Y) is a hypersurface in cr(Y). 

Remark 3.2. The expected dimension of J(Y, Z) is minjdimY + dimZ + l,dimPV} because a 
point x G J(Y, Z) is obtained by picking a point of Y , a point of Z, and a point on the line 
joining the two points. This expectation fails if and only if a general point of J(Y, Z) lies on a 
family of lines intersecting Y and Z, as when this happens one can vary the points on Y and Z 
used to form the secant line without varying the point x. 

Similarly, the expected dimension of cr r (Y) is r(dimV) + r — 1 which fails if and only if a 
general point of o~ r (Y) lies on a family of secant P r— s to Y. 
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Definition 3.3. For a variety X C PV, and point p G PV, the X-rank of p is the smallest 
number r such that p is in the linear span of r points of X. Thus <x r (X) is the Zariski closure 
of the set of points of X-rank r. The X -border rank of p is the smallest r such that p G o~ r (X). 
The typical X-rank of PV is the smallest r such that o~ r (X) = PV. 

3.2. The Segre variety and border rank. Define Seg(FVi x PV 2 ) C P(Vi<g>V 2 ), the ftoo- 
factor) Segre variety to be the projectivization of all the rank one elements of Vi®V2- Here Seg 
is the injective map 

Seg : PVi x PV 2 P(Vi®V 2 ) 
([«i], [«i]) i-> [vi <g> v 2 ] 

which, in bases, corresponds to multiplying a column vector (defined up to scale) with a row 
vector (defined up to scale) to get a rank one rectangular matrix (defined up to scale). Note 
that a r (Seg(¥Vi x PV)) is isomorphic to the set of (dim Vi x dim V 2 ) matrices of rank at most 
r, as the rank at most r matrices are exactly those that can be written as the sum of r matrices 
of rank one. 

More generally, the projectivization of the set of decomposable tensors in V® • • • <8> V n , i.e., 
P{T G Vi® •■■®V n \Hvj € Vj, T = vi® ■■■® v n }, may be identified with the product PVi x 
• • • x PV n . Let Se<?(PVi x • • • x PV n ) C P(Vi® ■ ■ ■ ® V n ) denote the corresponding variety, the 
(n-factor) Segre variety. 

For any variety X, a point of o~ 2 (X) is a point on a limit of secant lines, so if X is smooth, 
the point is either on X, on a secant line, or on a tangent line to X. Equation ((5J), when 
projectivized, is a curve of points on secant lines of Seg(¥ x P 1 x P 1 ) limiting to a point on a 
tangent line to Se^P 1 xP'xP 1 ), i.e., a point of f(Seg(FA x FB x PC)). 

We can now give geometric formulations of the concepts introduced in £Q] and 

• The border rank of a tensor T G V±® ■ ■ • ® V n , R(T), defined in §2. II above, is the smallest 
r such that [T] G cr r (Seg(FVi x • • • x PV„)). 

• The border rank of matrix multiplication 

M m:njP : (C m *®C n ) x (C n *0C p ) -» (C m *0C p ) 
is the smallest r such that 

[M m ^ p ] G o- r (Seg(F(C m ®C n *) x F(C n ®C p *) x P(C m *(g)C p ))) 

• The exponent of matrix multiplication is 

hm n ^{min r {[M re , n , re ] G a r (Seg(F n2 - 1 x P" 2 " 1 x P"^ 1 )}} 

• Upper bounds for border rank for a given n can be proven by finding values of r such 
that [M n , n , n ] G cr r (5e 5 (P n ~ x x P n " x x P n ~ x )) and lower bounds by finding values of r 
such that [M n>n>n ] £ a r {Seg{F n2 ~ l x F n2 ~ l x P™ 2 " 1 )). 

• The typical rank of an element of C a £*)C fe <X>C c is the smallest r such that o" r .(«S'eg(P a_1 x 

x p C -i^ = p(c a ®C 6 (g)C c ). 

3.3. What is known regarding matrix multiplication. The problem of determining the 
typical rank for the spaces that include the multiplication of square matrices has been completely 
solved: 

Theorem 3.4 (Lickteig [31]). For all n / 3, 

dimcr r (5e 5 (P n ~ 1 x P 71 " 1 x F n ~ 1 )) = min{r(3n - 2) - l,n 3 - 1}. 
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In particular note that Theorem 13.41 shows that Strassen's algorithm for 2x2 matrices could 
have been anticipated, as a 7 (Seg(F 3 x P 3 x P 3 )) = P(C 4 <g>C 4 <g>C 4 ). We outline the proof of 
Theorem 13.41 and discuss what is known about typical rank in SJDJ 

For the n = 3 case, we have: 

Theorem 3.5 (Strassen, [52]). cr±(Seg(F 2 xP 2 x P 2 )) is a hypersurface of degree nine. 

This case was solved by finding an explicit equation vanishing on ^(Se^P 2 x P 2 x P 2 )). In 
£j6j we discuss this equation and its consequences for matrix multiplication. 

The best lower bound on the border rank of matrix multiplication is: 
Theorem 3.6 (Lickteig 03]). R(M OT , mjm ) > ^ + § - 1. 

While we do not provide Lickteig's proof here, we remark that implicit in his proof are the 
presence of auxiliary varieties which we believe will play a central role in future work. Hl'2\ 
describes some of these varieties, including the sub space variety that is implicit in his proof. 

The best lower bounds on the rank of matrix multiplication are: 
Theorem 3.7 (Blaser [ID]). B.(M m ^ m ) > f m 2 - 3m. 

A new proof of Blaser 's theorem is presented in §6.21 Blaser has also proved that R(Af 3i3;3 ) > 
19 [11], and we discuss the main tool in the proof of Blaser's 19-theorem in §14.21 

The best upper bound for the exponent of matrix multiplication is < 2.38 due to Copper- 
smith and Winograd [24] • They use methods of Strassen [53] . We do not discuss these asymptotic 
bounds as we have no geometric interpretation for them. However, an earlier asymptotic bound 
due to Schonhage [48] does have relations with geometry. We discuss the geometric aspect of 
Schonhage's argument in §8.1] and present his explicit approximate algorithm for multiplying 
three by three matrices using 21 multiplications. 

There is also an algorithm for multiplying 3x3 matrices using 23 multiplications due to 
Laderman [M] which we do not discuss. 

The only case where the exact rank and border rank are known for the multiplication of 
square matrices are two by two matrices: 

Theorem 3.8 (Winograd [57]). R(M 2 ,2,2) = 7. 

Hopcroft and Kerr [31] proved Theorem 13.81 in the case of algorithms with integer coefficients. 

While we do not discuss the original proof of Theorem 13.81 an alternative proof is a conse- 
quence of a theorem of Brockett and Dobkin [13] that the rank of the multiplication in any 
simple algebra is at least twice the dimension of the algebra minus one. A proof of the Brockett- 
Dobkin theorem, due to Baur and presented in [T2], proceeds by splitting any putative simpler 
algorithm several times to eventually obtain a contradiction by producing a right ideal that is 
contained in a left ideal. 

Theorem 3.9 ([36]). R(M 2)2 ,2) = 7. 

To prove Theorem 13.91 we first decomposed aQ(Seg(F s x P 3 x P 3 )) into various components 
based on how the limiting P 5 was obtained from family of secant P 5 's. (By Theorem 13.81 one 
only needs to examine limiting planes.) For each possible limiting type we wrote down normal 
forms for the limit. Then we applied variants of Baur's proof of the Brockett-Dobkin theorem 
in each case to obtain a contradiction. In £}7Jwe give an idea how to study such limiting planes, 
which is also used in the construction of upper bounds. 

3.4. What is not known. The central conjecture in algebraic complexity theory is that the 
exponent of matrix multiplication is two. It is also of importance to find good upper and lower 
bounds for matrix multiplication for small and human scale values of n. Already for n = 3 
all that is known is 14 < R(M 3)3j3 ) < 21, and 19 < R(M 3 , 3)3 ) < 23. While the problem of 
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finding the denning equations for secant varieties of Segre varieties is a means to an end as far 
as matrix multiplication is concerned, for the purposes of algebraic statistics, it is essential to 
develop techniques for finding these equations and the equations of related varieties. For the 
area of phylogenetic invariants, an important open problem is to find the defining equations for 
a^SegiF^ x P 3 x P 3 )) as explained in §5.21 Other open questions are discussed in the remaining 
sections. 



In this section we take a detour from our main subject to discuss other situations where secant 
varieties arise: the solution of the polynomial Waring problem and the resolution of Hartshorne's 
conjecture on linear normality. 

4.1. The Waring problem for polynomials and variants. The Waring problem for poly- 
nomials is as follows: 

What is the smallest r$ = ro(d,n) such that a general homogeneous polynomial P(x l , . . . ,x n ) 
of degree d in n variables is expressable as the sum of ro d-th powers of linear forms? 

Let V = C n , and let S d V* denote the space of homogeneous polynomials of degree d on V. 
Let 



denote the Veronese map that sends the projectivization of a linear form to the projectivization of 
its d-th power. Thus the image is the set of (projectivized) d-th. powers of linear forms. Similary 
o~ p (vd(PV)) is the Zariski closure of the set of homogeneous polynomials that are expressable 
as the sum of p d-th powers of linear forms. So the Waring problem for polynomials may be 
re-expressed as: 

Let V = C n and let X = Vd(FV*). What is the typical X-rank of an element ofFS d V*, i.e., 
what is the smallest r = r (d,n) such that a rQ (v d (FV*)) = FS d V* ? 

This problem was solved by Alexander and Hirshowitz [3]: all o~ r (vd(F n )) are of the expected 
dimension except 0"7(f3(P 4 )), o~§{v± (P 2 )), 09 (t>4(P 3 )), 0"i4(u4(P 4 )), (which are all hypersurfaces), 
and <7 r (t>2(P n )), 2 < r < n (where dimcj r (v2(P n )) = rn — r — 1) . In other words, 

Theorem 4.1. [4J A general homogeneous polynomial of degree d in n variables is expressable 
as the sum of 



d-th powers with the exception of the cases ro(3, 5) = 8, ro(4, 3) = 6, ro(4, 4) = 10, ro(4, 5) = 15, 
and d = 2, where rg(2, n) = n. 

For a beautiful discussion of this problem and its history, including a self-contained proof, see 



A variant of the polynomial Waring problem is to find the typical rank of alternating tensors. 
Let A k V C V® k be the space of alternating tensors. Let G(k, V) C P(A fc y) denote the projec- 
tivization of the set of minimal rank alternating tensors. This variety is called the Grassmanian 
of /c-planes through the origin in V (i.e., we have a bijection, for linearly independent sets of 
vectors v\, . . . , Vk, Spanj-y^, . . . , Vk} — [vi A • • • A Vk}). In [18] they show that for 3 < k < ^, 
a r (G(k,n)) has the expected dimension provided that r < ^. Previous to that, it was known 
that G(2,n) had all secant varieties defective and G(3, 7), G(4, 8), and G(3, 9) all had their 
"last" secant variety before filling defective. (The examples G(2, n) are just the skew symmetric 



4. Secant varieties in algebraic geometry 



v d : FV* -> FS d V* 

[a] i— > [a o ■ ■ ■ o a] 




n 
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matrices of minimal rank; the examples G(3, 7) and G(4, 8) can be understood in terms of the 
geometry of the exceptional groups G2 and Spin-j.) 

Further generalizations of the polynomial Waring problem and their uses are discussed in |21j . 

The main tool for proving secant varieties are of the expected dimension is Terracings Lemma 
19.11 Proving they are degenerate, other than in cases when it is obvious, is more subtle. For 
all the Waring problems, there appears to be interpretations of the exceptional cases in terms 
of the geometry of Veronese varieties. The most interesting exception in the case of secant 
varieties of Segre varieties is o~4,{Seg(F 2 x P 2 x P 2 )) which is discussed in detail in £j6j In the 
proof of Lemma 3.16 of [I], a geometric explanation of the degeneracy is given: any four points 
on Seg(F 2 x P 2 x P 2 ) lie in some v 3 (F 2 ) C P(S 3 C 2 ) C P(C 2 <g>C 2 <g>C 2 ). Thus when one applies 
Terracini's lemma, each of the four embedded tangent spaces to the Segre must have at least a 
two-dimensional subspace in the P(S" 3 C 2 ) = P 9 , forcing a degeneracy. It would be interesting to 
have a systematic understanding of the Veronese varieties that unirule these exceptional cases, 
e.g., in terms of representation-theoretic data. 

4.2. Zak's theorems. Smooth projective varieties X n C P n+a of small codimension were shown 
by Barth and Larsen (see, e.g., [7]) to behave topologically as if they were complete interesections, 
i.e, the zero set of a homogeneous polynomials. This motivated Hartshorne's famous conjecture 
on complete intersections |30j . which says that if a < then X must indeed be a complete 
intersection. A first approximation to this difficult conjecture was also made by Hartshorne - his 
conjecture on linear normality, which was proved by Zak [58] (see [59] for an exposition). The 
linear normality conjecture was equivalent to showing that if a < § + 2, and X is not contained 
in a hyperplane, then a 2 {X) = p n + a . Zak went on to classify the exceptions in the equality case 
a = t| + 2. There are exactly four, which Zak called Severi varieties (after Severi, who solved 
the n = 2 case |50j). The first three Severi varieties have already been introduced: t>2(P 2 ) C P' 5 , 
Seg(F 2 x P 2 ) C P 7 , and G(2,6) C P 13 . The last is the complexified Cayley plane OP 2 C P 15 . 
These four varieties admit a uniform interpretation as the rank one elements in a rank three 
Jordan algebra over a composition algebra. 

An interesting open question is the secant defect problem. For a smooth projective variety 
X n c py ) not con t a i ne d in a hyperplane, with a 2 {X) / FV, let 5(X n ) = 2n + 1 - dimcr 2 P0, 
the secant defect of X. The largest known secant defect is 8, which occurs for the complexified 
Cayley plane. Problem: Is a larger secant defect than 8 possible? If we do not assume the 
variety is smooth, the defect is unbounded. (This question was posed originally in [4"2].) 

5. Other uses of secant varieties of Segre varieties and related objects 

5.1. Algebraic Statistics. A probability distribution is a point in V := M ai <S) • • • <8> K a " where 
the sums of coordinate elements add to one. For example, say we have two biased coins. Then 
V = M 2 ®]R 2 and a point corresponds to a matrix 



where ph,h is the probability that both coins, when tossed, come up heads, etc... 

A statistical model is a family of probability distributions given by a set of contraints that 
these distributions must satisfy, i.e., a subset of V. An algebraic statistical model consists of all 
joint probability distributions that are the common zeros of a set of polynomials on V . 

To continue our example, assume the outcome of the two coin tosses do not effect each other 
(as is the case with actual coins). Then the resulting matrix must have rank one. The set of 
all rank one, 2x2 matrices in the positive coordinate simplex is the corresponding algebraic 
statistical model, but it is almost equivalent to work with 5eg(MP 1 x MP 1 ). 
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Now assume we can measure the outcome of two of the events (tosses) but there may be a 
third event whose outcome influences the outcome of the other two although the outcomes of 
the two events we can measure are independent of one another (e.g. someone may be cheating 
by using magnets). 

Naively we should have a point of M ai (g)M a2 (g>M a3 but we can't measure the possible third, in 
fact we don't even know what 03 should be. 

Let's posit that some fixed 03 parametrizes the third outcome (if we posit there is no third 
event, then one takes 03 = 1). Then we sum up over all possibilities for the third factor to get 
a 2 x 2 matrix whose entries are 

(6) Pi,j = Pi,j,i H h Pi,j,a 3 , 1 < i < ai, 1 < j < a 2 

The algebraic statistical model here is the set of rank at most 03 matrices in the space of a\ x a 2 
matrices, a a3 (Seg(MF ai ~ 1 x MP" 2-1 )). Thus, given a particular model, e.g. a fixed value of 03, 
to test if our data (as points of IR ai tg)M a2 ) fits the model, we can check if it (mostly) lies inside 
aa^SegiW^- 1 x MP^" 1 )). 

In algebraic statistics one wants to test if a given model is applicable to a particular collection 
of data sets. Thus in particular, one needs a way of testing if a point p S M ai <S> ■ ■ ■ <8> P a " is a 
sum of at most r decomposable elements. 

It is easier to solve this problem first over the complex numbers and then return to the real 
situation later. Thus to test models of the type discussed above, one needs defining equations for 
secant varieties of Segre varieties. In sections $6]-[T3]we discuss methods for finding such equa- 
tions. These methods are applicable to finding equations for more general algebraic statistical 
models as well. They all rely on exploiting the group under which the model is invariant. 

For more on algebraic statistics, see |32|, I47j. 

5.2. Phylogenetic invariants. This is a special case of algebraic statistics, but is sufficiently 
important to merit its own subsection. In order to determine a tree that describes the evolu- 
tionary descent of a family of extant species, Lake [35], Cavender and Felsenstein [20J proposed 
the use of what is now called algebraic statistics by viewing the four bases composing DNA as 
the possible outcomes of a random variable. 

Given a collection of extant species, one would like to assess the likelyhood of each of the 
possible evolutionary trees that could have led to them. To do this, one can test the various 
DNA sequences that arise to see which algebraic statistical model fits best. More than that, the 
invariants discussed below identify the trees (nearly) uniquely. 

In what follows, contrary to some of the literature, we ignore time. 

The simplest situation is where one species gives rise to two new species. This can be pictured 
by a tree of the form 



F 




Al A2 



Figure 1. 
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There are three species involved, the parent F and the two offspring Al,A2, so the DNA 
occupies a point of the positive coordinate simplex in M 4 ®R 4 (g)K 4 , and we make our lives easier 
by working with P(C 4 <g)C 4 <S>C 4 ). We can measure the DNA of the two new species but not the 
ancestor, so the relevant algebraic statistical model is ai(Seg(P s xP 3 )), which is well understood. 
Here a\ = a>2 = 03 in the analogue of equation Q and we sum over the third factor. In this 
case there is nothing new to be learned from the model. 

The next case is where a parent F gives rise to three new species Al, A2, A3. Assuming species 
bifurcate, one might think that this gives rise to three distinct algebraic statistical models, as we 
could have F giving rise to A\ and G, then G splitting to A2 and A3 or two other possibilities. 
However, all three senarios give rise to the same algebraic statistical model: cr^Se^P 3 xP 3 xP 3 )). 
(See [6].) In other words, the following pictures all give rise to the same algebraic statistical 
models. 

F F F 

AAA 



A2 A3 Al A3 A2 A2 Al A3 



Figure 2. 



The defining equations of a^SegiF 3 x P 3 x P 3 )) are not known, and for reasons we explain 
below, it is a central question for the study of phylogenetic invariants to find them. 

Now consider the case where there are four new species Al, A2, A3, AA all from a common 
ancestor F. Here finally there are three different senarios that give rise to distinct algebraic 
statistical models. 

F F F 

AAA 

Al A2 ., A4 Al A3 A2 A4 Al A4 A2 A3 



Figure 3. 
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Note that there are no pictures like 




Figure 4. 



because such give rise to equivalent algebraic statistical models to the exhibited trees. 

We consider that parent F first gives rise to A\ and E, and then E gives rise to A 2 and G 
and G gives rise to A 3 and A4, as well as the equivalent (by the discussion above) senarios. The 
resulting algebraic statistical model is 

£12,34 : = or^SegiPAi x FA 2 x F(A 3 <g)A±)) n a 4 {Seg{F{A 1 ®A 2 ) n FA 3 x FA 4 )) 

Similarly we get the other two possibilities 

£13,24 : = cr^SegiPAi x FA 3 x F(A 2 ®A 4 )) n a<i(Seg(F(Ai®A 3 ) n PA 2 x PA4)) 

and 

Si4,23 := fr 4 (5ec/(P^i x PA 4 x P(A 2 ®A 3 )) n <J 4 (Sec/(P(,4i®,4 4 ) n PA 2 x FA 3 )) 

Note that these three are isomorphic as projective varieties, but are situated differently in 
F(Ai®A 2 ®A 3 ®A4), thus having defining equations for them would enable one to test between 
different evolutionary possibilities. An essential result of [6] is: 

Once one has defining equations for ai(Seg(F s x P 3 x P 3 )), one has defining equations for all 
algebraic statistical models corresponding to bifurcating phylogenetic trees. 

The proof relies on two results. First, no matter how many species one observes, because of 
the structure of the evolutionary trees, the resulting algebraic statistical model is an intersection 
of fourth secant varieties of Segre varieties corresponding to summing over the four outcomes on 
a hidden variable. The second ([6], Theorem 11) is equivalent to (and arrived at independently 
of) Proposition 112. 21 below, which in particular reduces the study of the fourth secant variety of 
any triple Segre product to the study of o~4(Seg(F 3 x P 3 x P 3 )). 

5.3. Entanglement and quantum computing. In quantum computing (see, e.g., [8] and 
the numerous references therein) a pure state corresponds to a point of P(C 2 (g) • • • <8> C 2 ) where 
there are N copies of C 2 . A product state corresponds to a point of Seg{F l x • • • x P 1 ) c 
P(C 2 ® • • • (£> C 2 ). A pure state is entangled if it is not a product state, and quantum computing 
is based on exploiting entangled states. A perhaps overly optimistic program is to classify 
the 17(2) x • • • x 17(2) and/or SL(2, C) x • • • x SL(2, C) orbits in C 2 <g) • • • C 2 , which would 
give a complete classification of entangled states. Failing that, one is interested in finding 
specific measures of entanglement. One measure of entanglement is called the Schmidt measure, 
introduced in [26]. In the language of this paper, the Schmidt measure of a tensor is the base 
two log of its rank. In [25] they observe that a tensor of a given Schmidt measure might be a 
limit of tensors of a lower Schmidt measure, in fact they give the explicit example of (j4]) in their 
equation (19), where their 1 1, 0, > corresponds to ai<8)6i®ci in (j4]). In [25] they decompose 
C 2 <g>C 2 <g>C 2 \0 into the union of four disjoint components which they label S, B, W, GHZ. In the 
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language of this paper, the components are 

S = Seg(F 2 xP 2 x P 2 )\0 = Seg(FA x FB x PC)\0, 

B = {Seg(FA x F(B®C) U Seg{F(A®B) x PC) U Seg{F(A®C) x P£)}\{0 U S}, 
W = f(Seg(FA x FB x PC))\{0 U5UB} 
C#Z = C 2 cg>C 2 ®C 2 \{0 USUBUW}. 

Compare B with the discussion of flattenings in §12i 

There is a vast literature regarding entanglement and there does not appear yet to be a 
consensus regarding what is the best way to measure entanglement, but it is clear that secant 
varieties of Segre varieties and related auxiliary varieties are relevant for the problem. 

6. STRASSEN'S EQUATIONS AND LOWER BOUNDS FOR RANK AND BORDER RANK 

In this section we introduce Strassen's equations and use them to give a new proof of Blaser's 
|-theorem. In E jlOl we rephrase the equations invariantly and give generalizations. 

6.1. Strassen's equations. Recall the notation a = dimvl, b = dim B, c = dimC. 

Theorem 6.1 (Strassen [52]). Let 3 < a < b = c < r. Let T G a r (Seg(FA x FB x PC)) and 
a £ A* be such that T a := T{a) G B®C, considered as a map T a : C* — ► B, is of full rank. For 
each a 1 , a 2 G A*, define the linear map T a aJ : B — > B by T a a j = T a jT a ~ x . Then 

Rank[T ajQ ,i,r aQ ,2] < 2(r — b) 

where [S, T] = ST — TS is the commutator of endomorphisms. 
Corollary 6.2 (Strassen [52]). cr 4 (Seg(F 2 x P 2 x P 2 )) ^ P(C 3 ®C 3 ®C 3 ). 

Proof of corollary. For generic T € A®B®C = C 3 ®C 3 (8)C 3 and a, a 1 , a 2 G A*, one has Rank([T a a i , T aa 2]) = 
3 but for points in a4,(Seg(FA x FB x PC)), the rank is at most two. □ 

Note that an easy calculation with Terracini's lemma (19. ip shows that a^(Seg(F 2 x P 2 x P 2 )) 
is at least a hypersurface, so the above corollary shows it is exactly a hypersurface. Strassen's 
equations are not presented as polynomials above. In fTUl we describe them as polynomials and 
give generalizations. 

Corollary 6.3 (Strassen [52]). R( M m,m,m) > 

Proof. Write out M mjr71]m explicitly in a good basis and takes a generic a G A* = Mat mxm . 
Then the corresponding linear map T a is a block diagonal matrix with blocks of size m, each 
block identical and the entries of the block arbitrary. So we have Rank ([T Q a i,T aa 2 ]) = m 2 . 
Hence m 2 < 2(r — m 2 ) and the result follows. □ 

6.2. Proof of Blaser's lower bound. Here is a proof of Theorem 13. 71 that uses Theorem 16.11 
which is implicit, but hidden, in his original proof. 

Lemma 6.4. Let U be a vector space, let P G S d U*\0. Let u±, ... ,u n be a basis of U. Then 
there exists a subset u^, . . . ,Ui s of cardinality s < d such that P |{ u< ) is not identically 
zero. 

The proof is an easy exercise. 

Lemma 6.5. Given any basis of Mat* mxm , there exists a subset of at least m 2 — 3m basis vectors 
that annhilate elements Id, x, y G Mat mxm such that [x,y] := xy — yx has maximal rank m. 
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Proof. Let A = Mat 

mxm — U*<S)W. Fixing a basis of A* is equivalent to fixing its dual basis of 
A. By Lemma 16.41 with P = det, we may find a subset S\ of at most m elements of our basis 
of A with some z € Span(Si) with det(z) ^ 0. We use z : U — > W to identify U ~ W which 
enables us to now consider A as an algebra with z playing the role of the identity element. 

Now let a € A be generic. Then the map ad(a) : A — > A, x i— » [a, x] will have a one- 
dimensional kernel. By letting P = ad{a)*{det) and applying Lemma 16.41 again, we may find a 
subset S*2 °f our basis of cardinality at most m such that there is an element x € A such that 
ad(a)(x) is invertible. Note that ad{x) : A — > A also is such that there are elements y with 
ad(x)y invertible. Thus we may apply Lemma 16.41 a third time to find a cardinality at most m 
subset S3 of our basis such that ad(x)y is invertible. Now in the worst possible case our three 
subsets are of maximal cardinality and do not intersect, in which case we have a cardinality 
m 2 — 3m subset of our dual basis that annihilates z = Id, x, y with RankQx, y\) = m. □ 

Proof of Theorem \3. 7| Let <f> denote a computation of M = M m ^ m of length r. Since Lker (M) = 
(i.e., Va € A\0, 3b € B such that M(a, b) 7^ 0) we may write (p = ipi + ^2 with R(^>i) = m 2 , 
R(V*2) = r — m 2 and Lker(^i) = 0. Now consider the m 2 elements of A* appearing in ipi. Since 
they span A* , by Lemma 16.51 we may choose a subset of m 2 — 3m of them that annhilate Id, x 
and y, where x, y are such that [x, y] has full rank. Let 4>\ denote the sum of all monomials in 
ipi whose A* terms annhilate Id, x, y, so R(0i) > m 2 — 3m. Let 4>2 = ipi — <f>\ + ^2 • 

Now apply Theorem 16.11 with T = fo, a = Id, ct\ = x, a.2 = V to get R(^>9.) > ^rank[x,y] + 
m 2 = \m 2 and thus R(0i + 02 ) > f^ 2 — 3m. □ 

7. Limits of secant planes 

There are several reasons for studying points on a r (Seg(¥Ai x • • • x ¥A n )) that are not on 
secant P r s. First, in order to prove a set of equations E is a set of defining equations for 
a r (Seg(PAi x • • • x ¥A n )), one must prove that any point in the zero set of E is either a point 
on a secant p r_1 or on a limit P r_1 . For example, the proof of the set-theoretic GSS conjecture 
(see H12\i in [37] proceeded in this fashion. Second, to prove lower bounds for the border rank of 
a given tensor, e.g., matrix multiplication, one could try to prove first it cannot lie on any secant 
P r_1 and then that it cannot lie on any limiting p r_1 either. This was the technique of proving 
R(.Mg 2 2) = 7 in [36] . Finally, a central ingredient for writing explicit approximate algorithms 
is to exploit certain limiting P r_1 's discussed below. 

This section and the next are not used in the remainder of the article so they can be skipped 
by readers primarily interested in the equations of secant varieties of Segre varieties. 

7.1. Limits for arbitrary projective varieties. Let X C P^ be a projective variety. Let 
o~j.(X) denote the set of points on o~ r (X) that lie on a secant P r_1 . We work inductively, so we 
assume we know the nature of points on a r -i(X) and study points on a r (X)\(a^(X)\Ja r -i(X)). 

It is convenient to study the limiting r-planes as points on the cone over the Grassmannian 
in its Plucker embedding, G(r,V) C ¥(A r V) (see the end of §4.1|) . I.e., we consider the curve 
of r planes as being represented by x\(t) A • • • A x r (t) and examine the limiting plane as t — > 0. 
(There must be a unique such plane as the Grassmannian is compact.) 

Let [p] €a r (X). Then there exist curves x\{t), x r (t) C X withp G lim t ^o(xi(t), . . . , x r (t)). 
We are interested in the case when dim(xi(0), . . . ,x r (0)} < r. (Here {v±, . . . ,Vk) denotes the 
linear span of the vectors v\, . . . ,vt-) Use the notation xj = Xj(0). Assume for the moment 
that linearly independent. Then we may write x r = c\X\ + • • • + c r _ix r _i for 

some constants c±, . . . , c r -\. Write each curve Xj(t) = Xj + tx'j + t 2 x" + • • • where derivatives 
are taken at t = 0. 
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Consider the Taylor series 

xi(i) A • • • A x r (t) =(xi + tx\ + t 2 x" H ) A • • • A {x r -\ + tx' r _ x + t 2 x" r _ x -\ ) A (x r + tx' r + t 2 x". H ) 

= t{(-l) r {dX^ + ■■■ Cr-l4-l - 4) A Xl A • • • A X r „i) + t 2 (...) + • • • 

If the t coefficient is nonzero, then p lies in the the r plane (xi, . . . , sc r _i, (cix^+- • • c r _ix^._ 1 — xj,)). 

If the t coefficient is zero, then c\x\ + • • • + c r -ix' r _ 1 — x' r = e\x\ + • • • e r _ix r _i for some 
constants ei, . . . , e r -\. In this case we must examine the t 2 coefficient of the expansion. It is 

7'— 1 r—1 

( T^ e^Xj. + ^ Cjx" — a;") A x\ A • • • A x r _i 
fe=i i=l 

One continues to higher order terms if this is zero. 

The algorithm of Example [8J] below uses the t coefficient, the algorithm of Example 18.31 uses 
the t 2 coefficient, and Algorithm 8.2 in [38] uses the coefficient of t 20 ! 

7.2. Limits for Segre varieties. A general curve on Seg(¥A x FB x PC) is of the form 
x(t) = a(t)®b(t)®c(t) where a(t) , b(t) , c(t) are respectively arbitrary curves in A\0,B\0,C\0 
with a(0) = a etc. We have x' = a'®b®c + a®b'(&c + a<S>b<g>c' where a',b',c' are respectively 
arbitrary elements of A, B, C, and higher order derivatives are obtained similarly. 

While the easiest way to obtain r points that are linearly dependent in the limit is to have 
two points limit to the same point, this turns out to be not as useful for upper bound algorithms 
as more subtle limits. On the other hand, when r is sufficiently small, any other type of limit 
involves exploiting the geometry of the Segre variety as we now explain. 

To simplify the situation, we work inductively and just look at "primitive" cases, i.e., require 
that the points on the limiting P r_1 do not lie on a r (Seg(FA' x PB' x PC')) where dim A' < 
dim^4 etc... (with at least one inequality strict), and moreover that the points do not lie on 
ar^Seg^A x PB x PC)). 

For example, for the two factor Segre Seg(¥A x WB), (which, if we are working by induction, 
must be studied for the three factor case, as it corresponds to the case dimC = 1), in order 
to have x\, . . . , x r G Seg(¥A x PB) such that dim(xi, . . . , x T ) < r — 1 and the points are not 
contained in some Seg(PA' x Pi?'), we must have a + b < r (see the erratum to [36] ). In the 
erratum to [36] we determine all possible x±, . . . , xq £ Seg(P 3 xP 3 xP 3 ) with dim(a;i, . . . , xq) < 6. 
The only possible cases where the points fail to lie in some Seg(P° x PB x PC) occur when they 
all lie in some Sec/(P 2 x P 2 x P 2 ). 

A basic property of projective space is that if X n C P n+a is a subvariety, then a general P a 
will intersect X in deg(X) points. (In fact this is the definition of the degree of X.) One can 
calculate that deg(Seg(P 2 x P 2 )) = 6 (see, e.g., [29], lecture 18) and codim(Se#(P 2 x P 2 )) = 4. 
Therefore, for any set of 5 points on Seg(P 2 x P 2 ) that are linearly independent, i.e., that span 
a P 4 , there is a sixth point in the P 4 that also lies on the Segre. Taking the span of these six 
points as our Xj(0), we get a limit set that allows the use of derivatives. This type of limit set is 
used several times in Example 18.31 to build Schdnhage's approximate algorithm for multiplying 
3x3 matrices using 21 multiplications. 

Similarly degiSegiP 1 x P 1 x P 1 )) = 6 and codim^e^P 1 x P 1 x P 1 )) = 4, which is exploited 
in Example 18. 11 

8. Upper bounds 

We now discuss how to use the geometry discussed above to find explicit approximate algo- 
rithms for executing a bilinear map. 
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8.1. Schonhage's results. Schonhage [38] isolated a common aspect to certain approximate 
algorithms for matrix multiplication which enabled him to generalize them and prove upper 
bounds for the exponent of matrix multiplication without even having explicit approximate 
algorithms. The essence of his idea is as follows: 

Say we have two bilinear maps f : U* x V* — > W and g : U* x V* — > W . Under certain 
conditions, R(/ © g) < R(/) + R{g), where f g : (U ® U)* x (V @V)* -> (W @W). 

Letting A = U®U, B = V®V, C = W®W, recall that curves Xj(t) on Seg(FA xFBx PC) 
are of the form aj(t)®bj(t)®Cj(t). We will obtain an approximate algorithm for f @g by having 
the Oj (0) be the U vectors needed for the / factor, the a,j(0)' = 0, and the dj(0)" be the U vectors 
needed for the g factor. Then for the B factor we take the bj(0) to be the V vectors needed for 
/ and the bj(0)' the V vectors needed for g, and the C limits are of the same nature as the B 
limits. Then the sum of the second derivatives will be / g. The only problem is, as explained 
in £J7J we need the zero-th and first order terms to be linearly dependent so that we are allowed 
to take the sum of the second derivatives. To obtain linear dependence, the points must lie in 
some degenerate position with respect to the Segre, but this is difficult to arrange. Schonhage's 
solution is to have these limit points in a two factor Segre (where it is easier to have degenerate 
limits), but this forces one of each U, V, W and U, V, W to be one-dimensional. Moreover, these 
restrictions only take care of the zero-th order term. To get the first order term killed, two of 
e.g., U, V, W are taken to be of dimension one and the third, say W to be of dimension roughly 
dim U dimV (assuming dimVF = 1). Even so, we still must add in a few extra terms to insure 
linear dependence, but they are small in number. Schonhage points out that in this situation 
it is known that neither of the /, g admits an approximate algorithm better than the standard 
algorithm. A more geometric understanding of this "trick" could lead to better upper bounds. 
What follows are two examples for matrix multiplication, the second of which follows the above 
scheme. 

Example 8.1 (Bini et. al.). An approximate algorithm for multiplying 2x2 matrices where 
the first matrix has a zero in the (2, 2) slot is presented in [9]. In what follows we show how the 
algorithm corresponds to a point of 05 ( 

p2 xp 3 x p3 

) . (It is relatively simple to pass back and forth 
between the algorithms and the description of the limiting P 4 that lies in 05 (Sea (P 2 x P 3 x P 3 )) 
that the tensor lies on. But the description of the P 4 shows the non-uniqueness of the algorithm 
and the salient geometric facts that are used more transparently.) In this case we have 5 points 
that are linearly dependent. In fact only four are needed, one can take any 5-th point in the 
span of the four and ignore it as its derivatives are not needed for the algorithm. We take 

xi = a\®P 2 ®c\, x 2 = a\®[3\<&c\, x 3 = a\®fil®{c\ + c\), 24 = a\®{f3\ + $)®c\. 

Note that all these points lie on a 5e^(P 1 x P 1 x P 1 ). Because codim(5e5(P 1 x P 1 x P 1 )) = 4, 
we are assured there is a fifth point of Seg(F 1 x P 1 x P 1 ) in the span of these four. (A general 
P 3 will intersect 5e^(P 1 xP'x P 1 )) in deg(5eo(P 1 xP'x P 1 )) = 6 points.) Moreoever, the 5-th 
point will not be in the span of any three of x±, . . . , X4. Then taking 

x[ = a\<&f3l®c\ + a\®l3l®c l 2 - a\®l3l®c l 2 , x' 2 = a\®P\®c\ + a\®0[®.c\ - a\®j3\®(^, 

x' 3 = a\®0l®(c\ + c\), x'4 = a 2 M(3\ + $)<8>4 

our matrix multiplication operator M for the partially filled matrices is M = x\ + x' 2 + x' 3 + x'^. 
The fact that we didn't use any of the initial points is not suprising as the derivatives can always 
be altered to incorporate the initial points. 

A splitting of the computation is the key to the reduction here as well. Split the calculation 
of M into two pieces, the terms involving ot\ and the rest. Those terms involving a\ can be 
accomplished using two multiplications and the rest can be accomplished using six. We change 
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notation slightly and write Xj = aj®bj®Cj and x'j = a'^bj^Cj + a,j®bj®Cj + aj®bj®c'j as we 
did before we began this example. The elements of B&C appearing with a\ each appears in 
the original x\,X2, so in order to have them appear in the final tensor we just need to take 
a 'i' a 2 = a \- Now to have the terms involving a 2 ,a1 appear in the final tensor, we need to 
differentiate the terms on the B and C factors. We can obtain two of these by setting b[ = P 2 
and c 2 = c 2 . We can get the remaining terms using x' 3 and x\ but we must introduce an 
error, which can then be absorbed by modifying b' x and c 2 . The result is that = a' 2 = a\, 
bi = fi\ — 0\, c 2 = c\ — c|, 6' 3 = 01, C4 = c\ and all the other first derivatives are zero. 

Remark 8.2. There is a similarity between this example and the algorithms using multiplicative 
complexity discussed in §14.11 

Example 8.3 (Schonhage). Consider matrix multiplication of 3 x 3 matrices where in the first 
matrix a\ = a\ = 0, in the second that (3 2 = /3| = fi\ = /3f = 0, and thus c 2 = c 2 = c 2 = c| = 
as well. We again split the computation into terms involving a\ and those that do not. (It 
might be useful to think of this multiplication as B x C — > A to make it look more symmetric.) 
Those that do not involve a\ use 6 multiplications in the naive algorithm and those involving 
a\ use four. 

As explained in g721 P 4 n(5e#(P 2 xP 2 )) will generally consist of 6 = deg(5e#(P 2 x P 2 )) points. 
Now the principle described above is used. That is, the initial 6 terms contain the correct six 
monomials in the B, C factors for the terms without a\ and the second derivatives of the A 
factor in these terms are used to provide the correct A terms, while the original A factor term is 
always a\ and it is paired with the derivatives in the B, C factors of the original terms. In this 
example, the spaces in B,C where the two different pieces live are nearly disjoint, so we need 
to differentiate twice to be able to get both the B and C coefficients new (which is why we used 
second, rather than first derivaties in the A-factor). 

What is interesting about this example is that taking three such blockings, one can "cover" 
the space of three by three matrices, and adding them together obtain an approximate algorithm 
for 3 using 21 multiplications. 

8.2. Finite group approach to upper bounds. Cohn and Umans [23| have proposed a 
different approach to constructing algorithms for matrix multiplication using the discrete Fourier 
transform and the representation theory of finite groups. 

Let G be a finite group and C[G] its group algebra. (See e.g., [19] for definitions and properties 
of the group algebra.) The discrete Fourier transform (DFT) D : C[G] — > C' G ' is an invertible 
linear map that actualizes Wedderburn's theorem that C[G] ~ Mat^ xdj (C) x • • • x Matd k xd k (C), 
where G has k irreducible representations and the dimension (character) of the j-th is dj. (See 
e.g., [33] for an exposition.) Thus multiplication in the group ring is reduced to multiplication 
of d\ X di, . . . , d r x d r matrices. 

The idea is, to multiply Mat nxm x Mat mxp — > Mat nxp one first bijectively maps bases of each 
of these three spaces into subsets of some finite group G. The subsets are themselves formed 
from three subsets Si, 52,53, of cardinalities n,m,p which have a disjointness property, called 
the triple product property in [23] : if S1S2S3 = Id, with € 5j _1 5j, then each Si = Id. Then 
the maps are to the three subsets 5i _1 52 , 52~ 1 53, 5i _1 53. The triple product property enables 
one to read off matrix multiplication from multiplication in the group ring. They then show, if 
is the exponent of matrix multiplication, that, if one can find such a group and subsets, then 

(nmp)l < d - 2 \G\ 

where d is the largest character of G. So one needs to find groups that are big enough to support 
triples satisfying the triple product property but as small as possible and with largest character 
as small as possible. 
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In [22] they give explicit examples which recover < 2.41 and state several combinatorial and 
group theoretic conjectures that, if true, would imply = 2. 

9. Dimensions of secant varieties of Segre varieties 

The most basic invariant of an algebraic variety is its dimension. In this section we discuss 
the standard tool for computing dimensions of secant varieties of projective varieties and its 
application to secant varieties of Segre varieties. The results of this section are not used in the 
following sections. 

9.1. Dimensions of secant varieties of Segre varieties and matrix multiplication. Let 

A,B,C be vector spaces of dimensions a, b,c. By Remark 13.21 the expected dimension of 
a r (Seg(FA x FB x PC)) is r(a- 1 +b- 1 + c- 1) + r- 1 = r(a + b + c-2) - 1. The dimension 
of the ambient space is abc - 1, so we expect a r (Seg(FA x FB x PC)) to fill F(A®B®C) as 
soon as r(a + b + c — 2) — 1 > abc — 1, i.e., 

abc 

(7 r > . 

1 ; ~ a + b + c-2 

Note that in the case a = b = c equation ([7]) becomes r > a 3 /(3a — 2) ~ a 2 /3. Taking 
a = n 2 , the right hand side of (J7D is roughly n 4 /3, showing already that matrix multiplication 
is far from being a generic bilinear map, as even the standard algorithm gives R(M njn)?1 ) < n 3 . 
(The actual typical X-rank cannot be smaller than the expected typical X-rank.) However for 
n = 2 we obtain r > 64/10 and thus r = 7 is expected to (and we will see below does) fill, so 
M% 2 2 is generic in this sense. 



9.2. Terracini's lemma and applications. Recall the notations from the begining of £ 13.11 
and adopt the additional notation that for Z C PV, T^Z = T Z Z C V is the embedded tangent 
space to Z at z £ Z. 

Lemma 9.1 (Terracini's Lemma (see, e.g., |2TJ[33j[59]) )• If[x] £ J(Y, Z) smoot h with [x] = [y+z], 
such that [y] G Y smooth , [z] £ Z smooth , then 

f [x] J(Y,Z) = f [y] Y + f [z] Z. 

Thus, if \p] = [x\ H h x r ] e cr r (X) srnooth with [xj] £ X smooth , then 

T[p]cr r (X) = T[ X1 ]X H h T[ Xr ]X. 

Terracini's lemma implies that for a variety X C PV, if any given cr r (X) is nondegenerate 
(i.e. of the expected dimension) and of dimension rdimX + r — 1, then all cr r i(X) for r' < r are 
nondegenerate. 

Thus one can show all secant varieties of X are non-degenerate if one shows cr p (X) = PV if 
dimPV = p(n - 1) +p - 1. 

The following trick occurs frequently in the literature: let Yi, . . . , Y„ C X, so T yi Y\ + • • • + 

Ty p Y p C T[ yi _ l |_ y jO-p(X). If one can show T yi Y\ + • • • + T yp Y p = V, one has shown cr p (X) = PV. 

Lickteig and Strassen show that for X = Seg(FA x Pi? x PC), remarkably just taking the Yi 
to be the Segre itself at most three times and taking other the Yi to be linear spaces in it is 
sufficient for certain cases: 

Lemma 9.2 (Lickteig [S]). Adopt the notation FA { = P(A(g)& i ®c i ) C Seg(FA x FB x PC), 
FBj = Fiaj^B^c'j) C Seg(FA x FB x PC). 

(1) We may choose points a\, . . . , a s € A, b\, . . . , b q E B, c\, . . . , c q , c' 1} . . . , c. s G C, such that 
J(FA 1 , FA q , FB X , . . . , FB S ) = A®B®C 



GEOMETRY AND THE COMPLEXITY OF MATRIX MULTIPLICATION 



19 



when q = bl\, s = a/2 and c = l\ + 12 and when a = b = 2, q + s = 2c, s, q > 2. 

(2) We may choose points a±, . . . , a s E A, b\, . . . , b q E B, c\, . . . , c q , c[, . . . , c' s E C, such that 

J(a 2 (Seg(FA x PB x PC)),FAx, . . . ,¥A q ,PB 1 , . . . ,FB S ) = A®B®C 

when q + s + 2 = c and a = b = 2. 

(3) We may choose points a\, . . . ,a s E A, b\, . . . ,b q E B, 
c\ , . . . , c q , c[ , . . . , c' s E C , such that 

J(a 3 (Seg(¥A xPBx PC)), ¥A U . . . , ¥A q , FB U . . . , PB S ) = A®B®C 

when q = s = c — 2>2 and a = b = 3. 

Using Lemma 19.21 Lickteig shows 

Theorem 9.3 (Lickteig 03]). cj r {Seg{VA x Pi? x PC)) is nondegenerate for all r whenever 
a < b < c, b, c are even and abc/(a + b + c — 2) is an integer. 

With a little more work one obtains Theorem 13.41 

A classical technique for showing a secant variety of any variety X C FV is degenerate is to 
find a variety Y C PV, with X CY, with dk(Y) very degenerate. Then, if X "catches up" i.e., if 
there exists r such that cr r (X) = cr r (Y), then crt(X) = &t(Y) for all t > r as well. (See, e.g. pjj] 
for a recent application.) To see this, first note that for u < r, cr r (X) = J(a r - u (X),a u (X)) C 
J(ov_ u (y), a u (X)) C a r (Y), so a r {Y) = J(a r - U (Y), a u (X)). Now write t = mr + u, 

a t (X) = J(a mr (X),a u (X)) 

= J{cJ {m - 1)r {Y\a u {Y), J(a r - U (Y), a u {X))) 

— &mr+u(Y^. 

In particular, since a r (Seg(¥A x FB)) is very degenerate, if we have a three factor case that 
is "unbalanced" in the sense that one space is much smaller than the others, it can catch up to a 
corresponding two factor case. For example <j2(Seg(F 1 x P 1 x P 3 )) = ^(Seg^C^C 2 ) x P 3 )). 
Note that when this catching up occurs, if one knows the ideal of the a priori larger variety, one 
obtains the ideals of the secant varieties of the smaller variety. Other uses of auxiliary varieties 
to understand the secant varieties of Segre varieties, are discussed in in J[2J 

In the past few years there have been several papers on the dimensions of secant varieties of 
Segre varieties, e.g., [T71 [El [TU CE1 [l]. These papers use methods similar to those of Strassen 
and Lickteig, but the language is more geometric (fat points, degeneration arguments). Some 
explanation of the relation between the algebreo-geometric and tensor language is given in pQ. 

With such steady progress, it seems reasonable to hope for a complete solution for the secant 
detectivity of Segre varieties in the near future, at least in the three factor case. 

10. Invariant description of Strassen's equations and generalizations 

In this section we first rephrase Strassen's equations as the image of a GL(A) x GL(B) x 
GL(C)-equivariant map. We use this rephrasing to describe how to explicitly write a basis of 
his equations in a "good" basis and to generalize his equations. To ease the reader into this 
perspective, we begin with a familiar case. 

10.1. Warm up: Invariant description of generators of the ideal of a r (Seg(FA x FB)). 
The set of a x b matrices of rank at most r is the zero set of the (r + 1) x (r + 1) minors, 
in fact these minors generate the ideal of a r (Seg(FA x FB)). To understand this space of 
equations invariantly, we begin with two by two minors. Choose bases {aj} of A, {b s } of B 
and write our resulting matrix representing a point of A®B as X = (x\). Consider the minor 
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Pij jSt := x l s x? t - x\x{ G S 2 (A<g,B)*. Note that P ijtSt = -Pji,st and P ijtSt = -Pij,ts- Hence 
Pij st S A 2 A*®A 2 B* , and in fact we have an injective map 

A^'^B* -> S 2 (A®B)* 

whose image is the space of 2 x 2 minors. By the same reasoning, there is an injective map 
A d A*®A d B* —> S d (A<g>B)* with image the d x d minors. We conclude 

The ideal of a r (Seg(FA x FB)) is generated by A r+1 A*®A r+1 B* C S r+1 {A®B)*. 

We will see in gTTj that A r+1 A*®A r+1 B* is an irreducible GL{A) x GL( J B)-submodule of 
S r+1 {A®B)* . A more precise goal than "finding equations for secant varieties of Segre varieties" 
is to find the irreducible modules generating their ideals. When we discuss finding invariant 
descriptions of sets of equations, ultimately we will mean as modules, but in the interm, we 
can simply mean "without reference to choices of bases", such as we have done here for the 
(r + 1) x (r + 1) minors. 

10.2. Strassen's equations reconsidered. In order to understand Strassen's equations in- 
variantly, we would like to get rid of the choices of a, a 1 , a 2 , and the requirement that a is 
such that T{a) be invertible in Theorem 16.11 In what follows we will deal with tensors instead 
of endomorphisms, composition of endomorphisms will correspond to contractions of tensors, 
and the commutator of two endomorphisms will correspond to contracting a tensor in two 
different ways and taking the difference of the two results. Note that matrix multiplication 
M : (U*®V) x (V*®W) -> U*®W itself is simply the contraction of V with V*, 

A linear map / : V —> W induces linear maps f Ak : A k V — > A k W. If dim^ = dimVT = n 
then, letting det(/) := / An , we have f An ~ l = / -1 cg) det(/), which follows from the canonical 
identification A n ~V ~ V*®A n V. 

The punch line of this section is 

Strassen's equations correspond to the image of the composition of the inclusion 
A 2 A®S h ~ 1 A®A h B®B®A h C®C -> (A®B®C) h+1 

with the projection 

(A®B®C) h+1 -> S b+1 (A®B®C). 

We remark that the composition of these two maps is not injective. In ^11.21 we describe 
the image precisely. We emphasize this perspective because it leads to vast generalizations of 
Strassens equations discussed in §10.41 

Given T G A(g>B®C, recall our notation T a G B®C. We have T Ah ~ l G A b ~ 1 J B®A b ~ 1 C = 
A h ~ l B®C*®A h C. We may wedge the A b_1 i? and B factors in 

T^ h - x ®T aj G A h ^B®C*®A h C®B®C 

together to obtain an element 

T°j G A b S®C*(g)A b C(g)C = C*®C®A h B®A h C. 

That is, up to tensoring with a one-dimensional vector space, we have a linear maps C — > C 
and can now take their commutators. Consider 

T£®T™ 2 G (A b J B(g)C*®A b C®C) 02 = C*®C®C*®C®(A h B)® 2 ®(A h C)® 2 

and contract a copy of C from T°i with a copy of C* from T" 2 to obtain an element of 
C*<g>C®(A b £?)® 2 (g)(A b C)® 2 . This contraction corresponds to the matrix multiplication of T£ 
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with T" 2 . And reversing the roles of T" x , T" 2 reverses the order of the matrix multiplication. 
Thus the difference of these two contractions is 

[T°i,T" 2 ] G C*®C®{A h B)® 2 ®(A h C)® 2 

and Strassen's theorem states that the rank of [T^,T" 2 ] is at most 2(r — b). 
With a little more care, one obtains a lower degree tensor, see for details. 

Remark 10.1. Strassen's equations were rediscovered in [6], guided by the geometry of phylo- 
genetic trees, which also enabled a nice presentation of them. The recent preprint [35] gives 
an even simpler description of Strassen's equations. Unfortunately the generalizations discussed 
below are not evident from either of these presentations. 

10.3. Explicit polynomials in bases. Here are polynomials corresponding to Strassen's com- 
mutator being of rank at most w: Let a 1 , a 2 , a 3 be a basis of A* , 0i, . . . , /3b, £i, . . • , £b bases of 
B*,C*. Consider the element 

P = a 2 A a 3 (g)(a 1 ) b ~ 1 (g)/?i A • • • A MPa®£l A • • • A £ b ®& 

This expands to (ignoring scalars) 

(a 2 <g>a 3 - a 3 ®a2)®(ai) b " 1 ®(^(-l) i+1 /3^/3i®/?,)^(^(-l) fc+1 ^^fc®A) 

j k 

= (-l) j+k [((a 1 ) b ~%p- j ^ k )®(a 2 ®P j ^tMa3^(3 s 0Ck) 

A hat over an index indicates the wedge product of all vectors in that index range except the 
hatted one. If we choose dual bases for A,B,C and write T = a\®X + a 2 ®Y + a 3 ®Z where 
the cij are dual to the ay and X, Y, Z are represented as b x b matrices with respect to the dual 
bases of B,C, then, let P(T) be the matrix with 

P(T) s t = Y+-^f +k ^*l)WZt-Y£Zl) 
where Xl is X with its 7-th row and k-th. column removed. Strassen's commutator has rank 

k J 

at most w if and only if all the (w + 1) x (w + 1) minors of P{T) are zero. It turns out that 
when one takes the determinant of P{T), one gets a reducible polynomial that is divisible by 
the determinant of X, so, e.g., when 6 = 3 one obtains an irreducible polynomial of degree nine 
(as opposed to 12). 

10.4. Generalizations of of Strassen's conditions. The key point in the discussion above 
was that contracting T in two different ways yielded tensors that commute if T is in a r (Seg(¥A* x 
FB* x PC*). Consider, for s, t such that s + 1 < b and a, ay € A*, the tensors 

E A S B®A S C, e A'5®A*C 

(in gHi2]we had s = l,t = b-1). We contract T^®T^®T^ to obtain elements oi A s+t B®A s+t C®A s B®A S C 

in two different ways, call these contractions V ; a,«i,Q 2 (^) an d ipa%2,ai(T)- 

Now say R(T) = r so we may write T = a\®b\®ci + • • • + a r ®b r ®c r for elements £ A, 
h G B, a G C. We have 

^a% 1 ,a 2 ( T ) = (ai,a 1 )(a J ,a)(a K ,a 2 )(b I+ j®b K )<gi(ci<g>cj + K), 

\I\=s,\J\=t,\K\=s 
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where a/ = A- ■ • Aa, s G A S A, (A/, a) € A 5 " 1 ^ and a/+j = aj Aa j etc. For this to be nonzero, 
we need / and J to be disjoint subsets of {1, . . . , r}. Similarly, J and K must be disjoint. If 
s + t = r this implies J = K. In summary: 

Theorem 10.2. ^ For T e a s+t (Seg(FA xPBx PC)), for all a,a l ,a 2 G 4* 

V> S '* i aCH - V> M 2 = 0. 

We have the bilinear map 

(A 2 (S s A)®S t A)* x (A®B®C)® 2s+t A s+t B®A s+t C®A s B®A s C. 
whose image is tp s ' 1 3 (T) — i/) s ' 2 i(T). We rewrite it as a polynomial map 

* s >* : A®B®C -> (A 2 (5M)® < S*^)®A s+t -B®A s+t C®A s 5®A s C. 

So just as with Strassen's equations, we no longer need to make choices of elements of A*. 

The only catch is we don't know whether or not \E ,S '* is identically zero. In [39] we show many 
of the are indeed nonzero and give independent subspaces (in fact independent GL(A) x 
GL(B) x GL(C)-submodules, see JED of the ideal of a s+t (Seg(¥A xPBx PC)). 

In [39], Corollary 5.6, using the above methods, we show that set-theoretic defining equations 
for a 4 (Seg(P 3 x P 3 x P 3 )), the case of interest for phylogenetic invariants, could be explicitly 
determined if one had a complete set of defining equations for ai(Seg(¥ 2 x P 2 x P 3 )). 

11. Representation theory and equations for secant varieties of Segre varieties 

As mentioned in the introduction, the most important tool for studying varieties invariant 
under a group action is representation theory. In this section we develop the necessary represnta- 
tion theory for studying secant varieties of Segre varieties. The theory developed in this section 
is also what is needed in the more general study of algebraic statistical models. We first describe 
how to decompose the space of polynomials on A±<E) ■ ■ ■ <8> A n into subspaces invariant under the 
action of the group of changes of bases in the vector spaces, GL(A\) x • • • x GL(A n ). We then 
describe Strassen's equations from this perspective and how to find preferred polynomials in 
each irreducible submodule. We also describe two notions, inheritance and prolongation, which 
facilitate our study. Once one has an explicit description of a space of polynomials as modules, 
it is algorithmic to write down an explicit basis of the module as we did in E jlO.li See [37^ HQ] 
for more details. 

11.1. Polynomials come in modules. Since a r (Seg(¥Ai x • • • x PA n )) is invariant under 
the action of G = GL(A\) x • • • x GL(A n ) acting on Ai<& • • • <g> A n = V, its ideal, which is 
a subset of the module ®dS d V*, must be as well. Thus we should study the equations of 
a r (Seg(PA 1 x • • • x PA n )) as G-modules. 

Given any G-module W, the first thing to do when studying W is to try to decompose it into 
isotypic components (which is always possible when G is reductive, as is our situation). That is, 
one can decompose W into a direct sum of irreducible modules, but this is not canonical. The 
isotypic decomposition (which is canonical) is obtained from the decomposition into irreducible 
submodules by grouping together all copies of isomorphic irreducible submodules. 

To decompose S d V* into G-isotypic components we use the Shur- Weyl duality between repre- 
sentations of the symmetric group on d letters &d and the representations of the general linear 
group GL(W). Both groups act on W® d : for A G GL(W) and a G we respectively have 

A.(vi® ■■■®v d ) = (A.vi)® ■ ■ ■ ® (A.v d ) 
a.(vi® •••®v d ) = w CT (i)<8> • • • <S> v a{d) 
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Schur-Weyl duality is the statement that each group is the commuting subgroup of the other, 
that is 

e d = {g G GL{W m ) | g.A.(v!® ■ ■ ■ ® v d ) = A.g.{ Vl ® ■■■®v d )\/A£ GL(W),\/v x , . . . , v d G W} 
and 

GL(W) = {g£ GL{W® d ) \ g.a.{ Vl ® ■ ■■®v d ) = a.g.( Vl ■ ■ ■ ® v d ) W G & d , V«i, . . . , v d G W}. 

Thus we can use the action of to obtain projection operators W® d — > W® d , whose im- 
ages are necessarily GL(VF)-submodules. Moreover, the duality assures us that all GL(W)- 
submodules may be obtained this way. For example 



S d W = {T G W® d | a{T) = T Vo" G 6 d } 

= Imvrs : W® d -»• where 7rs(i0i® ■ ■ ■ ® w d ) = ^ ^(l)® • • • ® w<r{d) 

A d W = {T G W® d | <r(T) = sgn{a)T V<r G 6 rf } 

= Imvr A : W m -> W r ® d where 7r A (^i® • ■ ■ ® i£>d) = ^ ^ sc/re(cr)u; CT(1) ® • • • ® w a{d) } 

Let 7r = (pi, . . . ,pf) be a partition of d, i.e., p\ > ■ ■ ■ > pf and p\ + • • • +pf = d. We use the 
notations |7r| = d and l(ir) = f. 

The irreducible representations of & d are indexed by partitions of d\ we let [it] denote the 
module induced by it. Here [n] may be obtained by a choice of Young symmetrizer q corre- 
sponding to a choice of a Young tableau associated to ir and applying the projection operator 
q to the group algebra C[©d] (see, e.g., [27], chapter four). 

Define S^W := Rom 6d ([iT},W® d ), which is an irreducible GL(W)-module. The GL(W)- 
isotypic decomposition of W m is W® d = ©| 7r |=d[7r]®S 7r W. The first factor is a trivial GL(W)- 
module so it only serves to tell us the multiplicity of the second, which is dim[-7r]. 

We now return to the space we are interested in, V = A\® ■ ■ ■ ® A n as a G = GL(A\) x • • • x 
GL(A n )-module: 

Proposition 11.1 ([57]). TheG = GL(A\)x - ■ - xGL(A n ) isotypic decomposition ofS d (A 1 ® •••(g) A n ) 
is 

S d {Ai® ■ ■ • <g> An) = ([7Ti]®---<8 [7T n ]) 6d <8)SK 1 A 1 ®---® S^Ak, 

kx|=— =kfe|=d 

where ([7Ti]® ■ ■ ■ ® [7Tfc]) ed denotes the space of & d -invariants (i.e., instances of the trivial repre- 
sentation of & d ) in [7Ti](g> • • • ®[vr n ]. 

The ([vri]® • • • ® [vr n ]) s<1 factor in the tensor product just serves to tell us the multiplicity of 
S ni Ai® • • • ® S^A^, via its dimension. 

Proof. We need to decompose S d (Ai® ■ ■ ■ ® A n ) as a G = GL(yli) x • • • x GL(A n )-module. We 
have 

(Ai® • • • ®A n )® d = ([tq]® • • • ®[7r n ])®(S 7ri A 1 ® • • • ® S nn A n ) 

\nj\=d 

But S d (A\0 • • ■ ® A n ) C (Ai® • • • ® A n )® rf is the set of elements invariant under the action of 
& d . (Here 6^ only acts on the [ttj], it leaves the S nj Aj's invariant.) □ 
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Now we need a way to calculate dim([-7ri]<8> •••<£> [ 7r fc]) Sd - This can be done using characters 
in low degrees (degrees as high as your computer is willing to tolerate). The key point is 

dim([7ri](g>---<g> [7T n }) ed = ^ XiriC*) ■■■**■„ (<r) 

' aee d 

where Xnj : 6^ — > C is the character of [nj] (see, e.g., [271119]). For any given d, one can compute 
these dimensions, but there is no known closed form formula for them when n > 2. 

Obtaining the above decomposition is essential when dealing with explicit equations. For 
example, Strassen has a priori three sets of equations for 03 (P 2 xP 2 x P 2 ). Are they redundant 
or not? By examining these equations as modules we find that they are: 

11.2. Strassen's equations as modules. Recall from ETlOlthat Strassen's equations for a r (Seg(¥ 2 x 
jpft-i x p6-i^ j n j e g ree 5 + 1 are obtained by composing the inclusion 

A 2 A®S b ~ 1 A®A b B®B®C®A b C -» (A®B®C) b+1 

with the projection 

(A®B®C) b+1 -» S b+1 (A®B®C). 

Now A 2 A®S b ~ 1 A®A b B(g)B®C®A b C is not an irreducible module. Since the maps are G- 
equivariant, by Shur's lemma the image is a direct sum of irreducible submodules. We need to de- 
termine which modules in A 2 At&S' 3 ' 1 A®A b B®B®C®A b C map nontrivially into S b+l (A®B®C). 

Since here b = dim B = dim C, we have, using a very special case of the Littlewood-Richardson 
rule (see, e.g., [27], chapter 6), 

{A 2 A®S b - l A)®{A b B®B)®{C®A b C) = (S h)1 A © S b - 1 , 1>1 A)®A b>1 B®A b>1 C 

(where we use the notation A^.i-B = S2,i,...,iB) so there are two possible modules. Were the 
first in the image, then one would be able to get equations in the case dim A = 2, but ^(P 1 x 
P 2 x P 2 ) = P(A(g)-B(X>C), so only the second can occur (and it is easy to check that it does). We 
conclude: 

Proposition 11.2. [39] Strassen's equations for <7f,(P 2 x p^" 1 x P 6 " 1 ) expressed as a module is 

S 6 _i,i ; iC 3 8)A 6)1 C 6 ®A 6)1 C 6 , 
in particular it is an irreducible module. 

When b = 3, we obtain S2iiAigiS2iiB®S2iiC which occurs with multiplicity one in S i (A<S>B<S>C). 
Thus, despite the apparently different role of A from B and C, in this case - and only in this 
case - exchanging the role of A with B or C yields the same space of equations. 

11.3. Highest weight vectors. When we study modules of polynomials, it will be convenient 
to have a "best" polynomial in the module. For example, since an irreducible G-module in S d V* 
is either entirely in or out of the ideal of a G-variety Z C PV, it is sufficient to check just a single 
polynomial in the module. In general, this "best polynomial" is provided by a choice of highest 
weight vector. We explain how to obtain such vectors when G = GL{A\) x ■ • • x GL(A n ). 

Fix a basis e±, . . . , e n of a vector space V. Let W be an irreducible GL(F)-module occurring 
in y® d for some d. We say w S W is a highest weight vector for W, if p(g).[w] = [w] for all upper 
triangular matrices g £ GL(V). (It makes sense to discuss matrices because we have fixed a 
basis of V.) Highest weight vectors are in some sense the simplest vectors occurring in a module. 
For example, when W = S V, (e\) d is a highest weight vector. For W = A d V, e\ A C2 A • • • A 
is a highest weight vector. In general the highest weight vector of an irreducible module will not 
correspond to a decomposable tensor. In c 1T V® d (~ S n V), the highest weight vector is 

C7r (ef Pl ®ef 2 ®---®ef d ) 
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where ir = (pi, . . . ,Pd) and we allow the last few pj to be zero in order to have a uniform 
expression. 

In [37] we give explicit algorithms for writing down highest weight vectors of submodules of 

S d (A 1 (g,---(g> A n ). 

An important observation for the next section is if v £ A® d is a highest weight vector for a 
submodule corresponding to a partition ir and a±, . . . ,a n is a basis of A, v may be expressed 
using only the vectors ax, . . . , au^y 

11.4. Inheritance. By examining equations grouped into modules, the dimensions of the vector 
spaces involved only come into play when verifying that the dimension is large enough to support 
a given module. For example: 

Proposition 11.3. [39] If a copy of 

S^Ax® • • • S nn A n 

occurs in 

I d {a r {Seg{FA\ x • • • x FA* n ))), 
then for all vector spaces A'- D Aj, the corresponding copy of 

S W1 A[® ■ ■ ■ ®S nn A' n 

occurs in 

IdiariSegiFA'i* x ••• x¥A' n *))). 

Moreover, a module S^A^® ■ ■ ■ ®S^ n A' n where the length of each ttj is at most a.j is in 
Id(cr r (5eg(P^4' 1 * x • • • x FA' n *))) if and only if the corresponding module is in I d {a r (Seg(FA\ x 
■■■xFA* n )). 

Our notation is such that given a variety Z C FV* , I{Z) C S'V denotes its ideal and 

h{z) = i{z) n s d v. 

Proof. A module is in the ideal if and only if its highest weight vector is. Choose ordered 
bases for A'j such that the first aj basis vectors form a basis of Aj. Then any highest weight 
vector for S^A^® ■ ■ ■ ®S- Kn A' n is also a highest weight vector for S^Ai® ■ ■ ■ ®S nn A n as long as 

l{lTj) < 8Lj. □ 

11.3 Thus a copy of a module S^A^ ■ ■ ■ ®S Vn An will be in I{a r {Seg{W l x • • • x W' 1 ))) 
if and only if the corresponding copy of the module S^C^ 71 " 1 * 1 ® • • • <S>S , 7rn C^ 7r ™) is in the ideal of 
a r (5ec/(P^ 7ri )- 1 x • • • x p'(O-i)). 

It is straightforward to determine I^{a2{Seg{^Ai x • • • x PA n ))) as a module: 

Theorem 11.4 ([37]. Theorem 4.7). The space of cubics vanishing on a 2 (Seg(FA 1 ' x • • • x FAD) 
is 

h{a 2 (Seg(FA\ x • • • x FAD)) = V ~ ^ * S^Ai^S^Aj^SxhAl 

I+J+L={l,...,k}, 
j=\J\>l,\L\>0 

( 2J -VSaA&SsiAj® S 3 Aj®S in A L . 

I+J={l,...,k}, I+L={l,...,k}, 
j=|J|>3 \L\>0even 
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11.5. Prolongation. For A C S k V define A&> = (A®S P V) n S p+k V, the p-th prolongation of 
A. Let 

Zeros(yl) = {[v]E PV* \ P(v) = VP G A}. 

Ideals of secant varieties satisfy a prolongation property, in particular for secant varieties of 
intersections of quadrics we have: 

Lemma 11.5. [38] Let A C S 2 V be a linear subspace with zero set Zeros(^4) C PV*. Then 

Zeros^- 1 )) D CT fc (Zeros(A)). 

Moreover, if Zeros( A) is not contained in a hyperplane, then for k > 2, Ik(ak(ZeTos(A)) = 0, 
and if A = J 2 (Zeros(A)), then I k+1 (a k (Zeios(A))) = A^ k ~ x \ 

Usually, for a variety X C PV, I(crk(X)) is not generated in degree k + 1. For example, 
consider the simplest intersection of quadrics, four points in P 2 . They generate six lines so o~(X) 
is a hypersurface of degree six. 

Let G be a semi-simple Lie or algebraic group, let V be the irreducible G-module of highest 
weight I and let X = G/P C PV^* be a homogeneously embedded rational homogeneous variety, 
i.e., the orbit of a highest weight line. (X = Seg{FA\® ■■■® FA* n ) C F(A X ® ■ ■ ■ ® A n )* = PV* 
is one such.) By an unpublished theorem of Kostant, hiX) = (V^) -1 C S 2 Vi and I(X) is 
generated in degree two. More generally, Ik(X) = (V^*) C S k Vi. We adopt the notation that 
if V = Vi, we write V k = V k \. In the Segre case, 

V k = S k A l ® S k A n c S k {A x ® ■■■®A n ) 

Proposition 11.6. [37] Let X C PV* be a variety not contained in a linear space. Then for all 
d > 0, I d (a d (X)) = 0. 

IfX = G/P is homogeneous, then Id+i((Td(X)) is the kernel of the contraction map (V 2 )*®S' d+1 V 
S d ~ l V. 

Examples illustrating Proposition 111.61 are given in [37] • Extensions and further applications 
of prolongations are given in |51j . 

12. Auxiliary varieties 

A simple observation is that if X C Y C PV, then any polynomial vanishing on Y also 
vanishes on X. We want to find polynomials in the ideal of secant varieties of Segre varieties, 
so it is natural to look for varieties Y that contain X = <T r (P^4i x • • • x WA n ) whose ideals we 
understand. In this section we give two examples of such varieties Y. 

12.1. Flat? and the GSS conjecture. For example, note that A®B®C = A®(B®C), which 
leads to the simple observation that a r (Seg(PAx'PBx'PC)) C a r (Seg(FAxF(B®C))). Moreover 
we explicitly know the generators of the ideal of a r (Seg(FA x F(B<S)C))), see §10.11 

More generally, define the flattening of a tensor T G A\<%) ■ ■ ■ ® A n by letting to let I = 
{»!,..., ip} C {l,...,n}, J = {l,...,n}\I, A 7 = A h ® ■ ■ ■ ® A ip , Aj = A h ® • • • ® A jn _ p and 
consider T G Aj®Aj. 

Let a = (ai, . . . , a n ) and define Ipiatf to be the ideal generated by the modules A r+1 A* I ®A r+1 Aj C 
n )* as /, J range over complementary subsets of {1, . . . ,n\. We let Flat® de- 
note the corresponding variety, i.e., 

Flatf = Duo-riSegiFAj x FAj)). 

We have a r (FAi x ■■■ x FA n ) C Flat®. 

The (755 conjecture [5H] is that equality holds when r = 2. Actually the conjecture is the 
stronger statement that I a2 (¥A 1 x—xFA n ) = ^Fiat u - ^ ne wea ker statement that equality holds as 
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sets was proven in |37j . It was also shown in [37] that the conjecture holds when a = (ai, a 2 , 03)- 
Since o"2(P^4i x ••• x FA n ) is reduced and irreducible, and Flat^ is irreducible, to prove the 
conjecture it would be sufficient to show Flat^ is reduced. Using the methods outlined in H13\ 
it is possible to reduce the conjecture further to showing that Flat^ is arithmetically Cohen- 
Macaulay, see [3d] . 

In [28], a computer calculation is presented that gives the dimensions of the minimal space 
of generators of the ideals of a 2 (Seg(¥ 1 x P 1 x P 1 x P 1 )) and a 2 (Seg(F 1 x P 1 x P 1 x P 1 x P 1 )), 
which, as shown in [5], allows one to prove the GSS conjecture for up to five factors. The proof 
relies on a variant of which was arrived at independently using the geometry of phylogenetic 
trees. 

12.2. Subspace varieties. 

Definition 12.1. Define the s-subspace variety 
(8) 

Sub s ■= F{T e A®B®C I 3A' cA,B'c B, C C C, dim A' = dim B' = dim C = s,T G A'®B'®C'} 

Note that a s (Seg(FA x FB x PC)) C Sub s , so the equations of Sub s are also equations for 
a s (Seg(FA xFBx PC)). 

Proposition 12.2. [39] The ideal of a r (Seg(FA\ x ••• x P^4*)), when each dirndl* > r is 
generated by the union of the the modules in its ideal inherited from the modules generating 
the ideal of a r (Seg(F r ~ 1 x ■ ■ ■ x P r_1 )) and the modules generating the ideal of Sub r . 

To see this, note that by Proposition lll.3[ a copy of a module S ni Ai<gi ■ ■ ■ <giS nn A n will 
be in I(cr r (Seg(F r ~ 1 x • • • x P r ~ 1 ))) if and only if the corresponding copy of the module 
S^C^ 1 )® • • • ®5 Wn C'(' r '*) is in the ideal of a r (S'ec/(P'( 7ri )- 1 x • • • x p'K)" 1 )). 

The ideal of Sub r is easy to describe: 

Theorem 12.3. |41j The ideal of Sub r is generated in degree r + 1 by the modules 
(9) A r+1 ^0A r+1 (^i(g) • • • ® Aj-i®A j+1 ® ■ ■ ■ <g> A n ) 

for 1 < j < n (minus redundancies). 

Proof. First note that the ideal of Sub r consists of all modules S ni Ai® • • • S Wn A n occurring in 
S d (Ai® ■ ■ ■ <S> A n ) where each -kj is a partition of d and at least one 7Tj has l(itj) > r. We need 
to show that this ideal is generated by the modules ([9]). But for each j, the ideal consisting of 
representations S^Ai® • • • (8) S nn A n occurring in S d (A\® ■ ■ ■ ® A n ) where l(irj) > r is generated 
in degree r + 1 by 

A r+1 ^®A r+1 (Ai(8) • • • ® A j - 1 ®A j+1 <» ■ ■ ■ ® K), 
because it is just the ideal of a r (FAj x F(Ai® ■ ■ ■ ® Aj® ■ ■ ■ ® An)). □ 

Corollary 12.4. [37] The ideal of a 2 (Seg(FA* x FB* x PC*)) is generated in degree three by 
k 2 A®k 2 (B®C),k 2 B®k 2 {A®C) and K 2 C®K 2 {A®B). 

Proof. o 2 {FA x FB x PC) = Sub 2 because ^(P 1 x P 1 x P 1 ) = P(C 2 x C 2 x C 2 ). □ 

We remark that the spaces A 2 A<g>A 2 (B®C) ,A 2 5(g>A 2 (A®C) , A 2 C®k 2 (A®B) intersect, so 
there is redundancy in the above description. This redundancy becomes apparent if one expresses 
the spaces as sums of irreducible modules. 

The s-subspace variety is a cousin of the rank varieties in [56]. Moreover, it has a natural 
desingularization explained in £ TL3l 
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13. Weyman's METHOD 

In this section we describe techniques for obtaining generators of the ideals of secant varieties 
of Segre varieties and more generally of G-varieties Z C PV, where G is a reductive group, V 
is a G-module and Z is a variety invariant under the action of G. In addition to providing 
generators of the ideal, the techniques enable one to compute the entire minimal free resolution 
of the ideal of Z as well as precise information about the singularities of Z. These techniques 
require considerably more machinery from commutative algebra and representation theory than 
we have used up until this point. We expect they will be useful in future work. 

Let G be a reductive group, let V be an irreducible G module, and let Z C PV be a G- variety. 

G-varieties are often uniruled by large linear spaces, and singularities occur when the linear 
spaces crash into one another. To remedy this, one could try to untangle the linear spaces. 
This appears to be the idea underlying Kempf 's desingularization by the collapsing of a vector 
bundle. The idea is, given a G-variety Z C PV, to find (i.) a homogeneous variety G/P, (ii.) 
a homogeneous vector bundle E — > G/P that is the subbundle of a trivial bundle V with fiber 
isomorphic to V (here P is a parabolic subgroup of G), and (hi.) a map FE — > Z that is a 
desingularization . 

For example, let G(k, A) denote the Grassmannian of £>planes through the origin in A. let 
G = GL(A) x GL{B) x GL(C), let Z = Sub s be as defined in 0T27T1 Then let G/P = G(s, A) x 
G(s, B) x G(s, G) and let E = S A ®S B ®Sc, where S a \f is the s-plane F C A. Then FE -> Sub s 
gives the desired desingularization. 

Weyman takes Kempf 's idea a step further by observing that often one can "push down" the 
minimal free resolution of the total space of E as a subvariety of the total space of the trivial 
bundle (more precisely, of the structure sheaf of E as an Oy-module) to obtain the minimal free 
resolution of Z. Moreover, since the whole procedure is G-equi variant, one gets the generators 
as modules. 

The idea is as follows: Assume that the sheaf cohomology groups H l (S d (E*)) are all zero for 
i > and for all d. Consider the exact sequence 

-» (V/E)* -> V* -> E* -> 

giving rise, for each j, to a sequence 

-f A j (V/E)* -> A j V* -» A j ~ l V_*®E* ► V*^S J ~ 1 E* -► S^* -» 

Since V is trivial, and by our hypothesis all terms but the first have no cohomology in degree 
greater than zero, when we take the long exact sequence in cohomology, we can split it into 
short exact sequences that we can in turn splice together to conclude that H k (A :) (V_/E)*) is the 
A;-th homology of the sequence 

-> H°(A j V*) -> #°(A J ~ 1 V*®£'*) -> ► H°(S j E*) -> 0. 

We add the hypothesis that the last step is surjective. 
Now consider 



A_dy* _^ A d - 1 V*^H°(S 1 E*) -> ► V*®fr (5 d - 1 £?*) -> H°(S d E*) -> 

T T T T 

A d V* -» A d-1 V*(g)V* -» ► V^^V* -> S^V* -> 

T T T T 

A d ~ l ®Ii(Z) — » ► V*®/ d _i(Z)^ o 



where in the middle row we have S d V* = i7°(S' d V*) which justifies the top row of vertical 
arrows. The horizontal arrows are from the Koszul sequence. The generators of the ideal of Z 



GEOMETRY AND THE COMPLEXITY OF MATRIX MULTIPLICATION 



29 



in degree d corresponds to the cokernel of the lower right arrow. Now apply the snake lemma 
to see it is the homology of the d-th. entry in the top sequence, which by the observation above 
is H d ~ 1 (A d (V_/E)*). (One obtains the full minimal free resolution in a similar fashion.) 

All the bundles in question are homogeneous. If they are moreover irreducible, then one can 
apply the Bott-Borel-Weil theorem to reduce the calculation of the cohomology to a combina- 
torial calculation with the Weyl group of G. Even if they are not irreducible, one can use BBW 
on the associated graded bundles and then apply spectral sequences. For those who prefer to 
avoid spectral sequences in such calculations, see [36] , 

Note that since we had to use the snake lemma, we have no canonical way of identifying 
H d ~ 1 (A d (V_/E)*) with the space of generators in degree d, but in the equivariant setup, at least 
they agree as modules. 

Sometimes it is sufficient to work with a partial desingularization of Z, or a desingularization 
of a G variety that contains Z as a variety of small codimension. 

In fact, one does not need Z to be a G- variety (although for applications it almost always is). 

Theorem 13.1. [56] Let Y C P^ be a variety and suppose there is a projective variety B and 
a vector bundle E — > B that is a subbundle of a trivial bundle V_ — > B with V_ z ~ V for z E B 
such that E — > Y is a desingularization. Write rj = E* and £ = (VJE)* 

If the sheaf cohomology groups H l (B, S d n) are all zero for i > and the linear maps 
H°(B, S d r))®V* -> H°(B, S d+1 n) are surjective for all d>0, then 

(1) Y is normal, with rational singularities. 

(2) The coordinate ring K[Y] satisfies K[Y] d ~ H°(B, S d n). 

(3) The vector space of minimal generators of the ideal of Y in degree d is isomorphic to 
H d ~ 1 (B,A d £ i ), which is also the homology of the sequence 

A 2 V®H°(B,S d - 2 r)) -> V®H®{B,S d - l ri) -> H (B,S d r/). 

(4) More generally, (BjH J (A l+J £) is isomorphic to the i-th term in the minimal free resolution 
ofY. 

If moreover Y is a G-variety and the desingularization is G-equivariant, then the identifications 
above are as G-modules. 

Using these methods, the minimal generators of the ideals of a r (Seg(F l xP k x P c )), 03 (P° x 
P x P c ) and cr 2 (P a x F b x P c x F d ) have been determined, see |41] , The method also gives 
information about the singularities (e.g. normality, arithmetically Cohen-Macaulay-ness) , which, 
as mentioned above, can be used to reduce problems such as the GSS conjecture. 

14. Appendix: Invariant formulations of two definitions from complexity 

THEORY 

The purpose of this section is to show how multiplicative complexity and separations can be 
viewed invariantly, and to discusses advantages of the invariant perspective. While the discussion 
is elementary, it is intended primarily for those already familiar with these notions and their 
uses. 

14.1. Multiplicative complexity and tensors. A slightly larger class of algorithms for ex- 
ecuting bilinear maps / : A x B — > C than those discussed in §1.21 is obtained by writing 
V = A © B and considering T as a bilinear map V x V — > C. The multiplicative complexity of T 
is the rank of T considered as a bilinear map V x V — > C. See §14.11 for an example of a tensor 
T whose multiplicative complexity is less than R(T). 

The multiplicative complexity is the minimal number multiplications needed over all algo- 
rithms expressible as straight line programs, which is a class of algorithms that are intended to 
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model (classical) computer programs. See j!4l . Definition 4.2 for a precise definition and a proof 
of this statement. 

The multiplicative complexity of a map is bounded both above by its rank (obvious) and 
below by half the rank (see |14j . p354). So if one is only concerned with the exponent of matrix 
multiplication, one may restrict to the study of rank. 

Our definition of multiplicative complexity gives an immediate proof of (14.8) in p3] which 
says that R(T) > multiplicative complexity(T) > 2R(T). To see this, note that (^4 © B)®(A © 
B)®C = © A®B®C®A(&A®C © B(&B(&C; so any expression for T in (A © B)® 2 (&C 

of rank r projects to an expression for T of rank at most 2r in A(&B®C (and of course the 
projections to A®A®C and B&B&C must be zero). 

Here is an example where the multiplicative complexity of a tensor is lower than its rank 
whose presentation here also illustrates our definition. 

Example 14.1. Write V = A © B. The multiplicative complexity of T G A®B®C is its rank 
considered as an element of V®V®C. (This definition differs from those in the literature, e.g., 
[T4] p. 352, but is equivalent.) Alekseyev [3], building on work of Hopcroft and Kerr [31], showed 
that Rank(M2 2 3) = H 5 but Waksman [55] give an explicit algorithm for M2 ! 2,3 that uses 10 
multiplications. Here is such an algorithm expressed as a tensor in (A © B)<gi(A © B)<S)C: 

M 2 , 2 ,3 =^(a\ + b\)®(a\ + b\)®{c\ - c?) + -{a\ + b 2 2 )®{a\ + b\)®(c\ + c\ + c|) 

+ -{a\ + bl)®{a\ + b\)®{c\ - c§) + (a 2 + b 2 )®{a 2 + b\)®c\ 

+ + b 2 2 )®{al + b\)®{-c\ + c\ - c|) + (a\ + 6|)®(a| + b\)®c\ 

+ \(<A ~ bjM-al + b\)®(c\ + cf) + -(a\ - b 2 2 )®{-a\ + b\)®{c\ - cj - c|) 

+ \(*\ ~ bl)®{-a\ + b\)®(c\ + c 2 ) + \{a\~ b 2 2 )®{-a 2 2 + b\)®{c\ + c 2 + 4). 

Remark 14.2. It might also be natural to consider expressions of T G A&B&C in (Affii?ffiC)® 3 , 
although it is not clear how to encode such an object in a straight line program. In any case, 
the savings would be at best by a factor of 6 by the same reasoning as in the paragraph above. 

14.2. Separations of computations. A standard technique for showing lower bounds (due to 
Alder and Strassen [2]), is separations. The best known lower bound for M3 3 3 is 19 (due to 
Blaser |llj). It is obtained by extensive use of separations. In this section we define separations 
in a more invariant fashion than in [2] and suggest a more geometric variant. 

Definition 14.3. Let cj) G A*®B*®C be a computed tensor with computation of length r. 
Let A\ C A, B\ C B, C\ C C be subspaces. We say <fi separates (A\, Bi,C\) if we may write 
(j) = <j)\ + <j) 2 + 03 where the (fi^s are computed tensors whose lengths sum to r with the properties 
that 

Lker(0i|Aj = 0, Rker^jBj = 

and no decomposable tensor appearing in the expression 4>i + <j) 2 takes values in C\. (This 
definition is equivalent to the standard one.) Here for a bilinear map ip : Ax B — > C, Lker(^) = 
{a G A j ip(a, b) = V6 G B} and similarly for Rker^) C B. 

For (j) as above, the length of 4> is at least dim A\ + dim B\ plus the number of decomposable 
tensors appearing in ^3 taking values in C±; this is called the Separation Lemma. As this 
observation indicates, separations are useful for obtaining lower bounds for the rank of a tensor. 
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If Lker(0) = 0, then separates (A, 0,0), and similarly for the right kernel. If lmage(0) = C 
then separates (0,0, C). Also, if separates (A',B',C) then for any A" C A', B" C B', 
C" C C", separates (A",B",C"). 

Lemma 14.4 (Extension lemma). [2] Let G j4*(g).B*(g)C be a computed tensor that separates 
(Ai, Bi, C\). Let 4i C i 2 C A. If fails to separate (A2 , B\ , C\ ) , then there exists a G v^V^i 
with 

(10) 0(o, J3) C ^(a,Bi) + Ci. 

Of course the same is true with the roles of A and B interchanged. 

Proof. We try to write = 4>i + 4>2 + <p3 such that the tilded splitting of separates (^2 , B\ , Ci) . 

Write 03 = 03 + 03 with Image(03) C Ci and 03 maximal with this property. (Note that 03 is 
unique.) Then consider ip = 0i + 02 + 3 and say ip has length L Then we have the best chance 
of separating (A2,B\,C\) if we choose 02 of minimal rank such that Rker02 \b\ = 0- Thus the 
length of 02 = dim Si =: b\. There are at most choices of such 02- Given any admissible 
such choice, the resulting 0i := -0 — 02 must also have the property that Lker0i 1^!= 0. Say we 
have such a choice and we want to see if the separation extends to A2, i.e., that Lker0i \a 2 = 0. 
Now suppose not, then there exists a G A2\A\ such that a G Lker(0i), and thus for all b G B 

0(a, b) = 2 (a, b) + 3 (a, b). 

Write B = B 1 © Rker(0 2 ) and given b G B,b = b' + b" uniquely with b' G B\, b" G Rker(0 2 )- So 

0M) =0(a,&O + 03M") G (0(a, J B 1 )) + Ci 

So we see if fails to separate for at least one choice of tilded splitting equation, then (|10|) holds. 
In particular equation (|10|) holds if it fails for all possible choices. □ 

Here is an easy application of the extension lemma: 

Proposition 14.5. If A is a simple algebra and R C A a maximal right ideal, then any compu- 
tation of Mult a separates (R,A,0). 

Proof. Since separates (A, 0,0) it separates (R, 0, 0). Let B± C B be maximal such that 
separates (R,Bi,0). If B\ 7^ B then there exists a nonzero b G B such that Ab C (RB) = R, a 
contradiction as a left ideal cannot be contained in a right ideal. □ 

As a corollary we obtain a very easy proof that R(M mjmjm ) > 2m 2 — m. 

Definition 14.6. A more natural and general definition of separation (which, to avoid confusion, 
we call Separation), is as follows: Given T G V*® ■ ■ ■ <£>V*, a computation of T and Uj C Vj 
we will say Separates (17%, . . . , U n ) if we have a decomposition = 0i + • • • + n + ip with each 

0j : Uj -» V^® • • • (g)F/_ 1 oy/ +1 «) • • • <g>F n * 

injective and length(0) = Yli length(0j) + length(^). 

If Separates (A\ ,B\,Ci) then the length of is at least dim A\ + dim B\ + dim C\ so the 
conclusion of the corresponding Separation lemma is a little stronger than that of the separation 
lemma (but the hypotheses are stronger as well). Note that the hypotheses are also basis 
independent, unlike the separation lemma. 

We leave the statement and proof of the analogous Extension lemma to the reader. 
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